Using the Alignment API: a small tutorial on the Alignment API

This version:
https://moex.gitlabpages.inria.fr/alignapi/tutorial/tutorial1/
Author:
Jérôme Euzenat, INRIA & Univ. Grenoble Alpes

Here is a small tutorial for the alignment API. Most of the tutorial is based on command-lines invocations. Of course, it is not the natural way to use this API: it is made for being embedded in some application programme and we are working towards implementing an alignment server that can help programmes to use the API remotely. The complete tutorial is also available as a self-contained script.sh or script.bat. We make no garantee on the MS-DOS script, it is only here for helping.

A companion tutorial has been designed for the Alignment Server. It follows, as much as possible, the reasoning of this tutorial but provides input and output through a web browser.

This tutorial has been updated for the Alignment API version 4.0 (versions working with previous versions, starting with 2.4, can be found in their respective html directory).

Preparation

First you must download the Alignment API and check that it works as indicated here.

You can modify the Alignment API and its implementation. In this tutorial, we will simply learn how to use it.

You will then go to the directory of this tutorial by doing:

$ cd tutorial1

You can clean up previous trials by:

$ rm results/*

The goal of this tutorial is only to help you realize the possibilities of the Alignment API and implementation. It can be played by invoking each command line from the command-line interpreter. In this example we use the sh syntax (which only affects the export VARIABLE=VALUE command which can be rewritten as setenv VARIABLE VALUE with c-shells).

The data

Your mission, if you accept it, will be to find the best alignment between two bibliographic ontologies. They can be seen here:

edu.mit.visus.bibtex.owl
is a relatively faithfull transcription of BibTeX as an ontology. It can be seen here in RDF/XML or HTML.
myOnto.owl
is an extension of the previous one that contains a number of supplementary concepts. It can be seen here in RDF/XML or HTML.

These two ontologies have been used for a few years in the Ontology Alignment Evaluation Initiative.

Matching

For demonstrating the use of our implementation of the Alignment API, we implemented a particular processor (fr.inrialpes.exmo.align.cli.Procalign) which:

Let's try to match these two ontologies ($CWD is a variable that has been set up to the directory just above this one):

$ java -jar ../../../lib/procalign.jar file://$CWD/myOnto.owl file://$CWD/edu.mit.visus.bibtex.owl

The result is displayed on the standard output. Since the output is too long we send it to a file by using the -o switch:

$ java -jar ../../../lib/procalign.jar file://$CWD/myOnto.owl file://$CWD/edu.mit.visus.bibtex.owl -o results/equal.rdf

Additional options are available:

Hence, it is possible to display the alignment in HTML by using the adequate renderer:

$ java -jar ../../../lib/procalign.jar file://$CWD/myOnto.owl file://$CWD/edu.mit.visus.bibtex.owl -r fr.inrialpes.exmo.align.impl.renderer.HTMLRendererVisitor -o results/equal.html
and opening the results/equal.html in a browser.

See the output in RDF/XML or HTML.

The result is expressed in the Alignment format. This format, in RDF/XML, is made of a header containing "metadata" about the alignment:

<?xml version='1.0' encoding='utf-8' standalone='no'?> <rdf:RDF xmlns='http://knowledgeweb.semanticweb.org/heterogeneity/alignment#' xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:xsd='http://www.w3.org/2001/XMLSchema#' xmlns:align='http://knowledgeweb.semanticweb.org/heterogeneity/alignment#'> <Alignment> <xml>yes</xml> <level>0</level> <type>**</type> <method>fr.inrialpes.exmo.align.impl.method.StringDistAlignment</method> <time>18</time> <onto1> <Ontology rdf:about="https://moex.gitlabpages.inria.fr/alignapi/tutorial/myOnto.owl"> <location>file:///Java/alignapi/html/tutorial/myOnto.owl</location> <formalism> <Formalism align:name="OWL1.0" align:uri="http://www.w3.org/2002/07/owl#"/> </formalism> </Ontology> </onto1> <onto2> <Ontology rdf:about="https://moex.gitlabpages.inria.fr/alignapi/tutorial/edu.mit.visus.bibtex.owl"> <location>file:///Java/alignapi/html/tutorial/edu.mit.visus.bibtex.owl</location> <formalism> <Formalism align:name="OWL1.0" align:uri="http://www.w3.org/2002/07/owl#"/> </formalism> </Ontology> </onto2>

and the corresponding set of correspondences:

<map> <Cell> <entity1 rdf:resource="https://moex.gitlabpages.inria.fr/alignapi/tutorial/myOnto.owl#Article"/> <entity2 rdf:resource="https://moex.gitlabpages.inria.fr/alignapi/tutorial/edu.mit.visus.bibtex.owl#Article"/> <measure rdf:datatype="http://www.w3.org/2001/XMLSchema#float">1.0</measure> <relation>=</relation> </Cell> </map>

each correspondence is made of two references to the aligned entities, the relation holding between the entities (=) and a confidence measure (1.0) in this correspondence. Here, because the default method that has been used for aligning the ontologies is so simple (it only compares the labels of the entities and find that there is a correspondence if their labels are equal), the correspondences are always that simple. But it is too simple so we will use a more sophisticated method based on an edit distance:

$ java -jar ../../../lib/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance file://$CWD/myOnto.owl file://$CWD/edu.mit.visus.bibtex.owl -o results/levenshtein.rdf

See the output in RDF/XML or HTML (if rendered as before).

This is achieved by specifying the class of Alignment to be used (through the -i switch) and the distance function to be used (-DstringFunction=levenshteinDistance).

Look at the results: how are they different from before?

We can see that the correspondences now contain confidence factors different than 1.0, they also match strings which are not the same and indeed far more correspondences are available.

We do the same with another measure (smoaDistance):

$ java -jar ../../../lib/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=smoaDistance file://$CWD/myOnto.owl file://$CWD/edu.mit.visus.bibtex.owl -o results/SMOA.rdf

More work: you can apply other available alignments classes. Look in the ../../../src/fr/inrialpes/exmo/align/impl/method directory for more simple alignment methods. Also look in the StringDistances class the possible values for stringFunction (they are the names of methods).

Advanced: You can also look at the instructions for installing WordNet and its Java interface and use a WordNet based distance provided with the API implementation by ($WNDIR is the directory where wordnet 3.0 is installed):

$ java -cp ../../../lib/procalign.jar:../../../lib/jwnl/jwnl.jar fr.inrialpes.exmo.align.cli.Procalign -Dwndict=$WNDIR -i fr.inrialpes.exmo.align.ling.JWNLAlignment file://$CWD/myOnto.owl file://$CWD/edu.mit.visus.bibtex.owl -o results/jwnl.rdf

See the output in RDF/XML or HTML (if rendered as before).

Manipulating

As can be seen there are some correspondences that do not really make sense. Fortunately, they also have very low confidence values. It is thus interesting to use a threshold for eliminating these values. Let's try a threshold of .33 over the alignment (with the -t switch):

$ java -jar ../../../lib/procalign.jar file://$CWD/myOnto.owl file://$CWD/edu.mit.visus.bibtex.owl -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=levenshteinDistance -t 0.33 -o results/levenshtein33.rdf

See the output in RDF/XML or HTML (if rendered as before).

As expected we have suppressed some of these inaccurate correspondences. But did we also suppressed accurate ones?

This operation has contributed eliminating a number of innacurate correspondences like Journal-Conference or Composite-Conference. However, there remains some unaccurate correspondences like Institution-InCollection and Published-UnPublished!

We can also apply this treatment to other methods available:

$ java -jar ../../../lib/procalign.jar -i fr.inrialpes.exmo.align.impl.method.StringDistAlignment -DstringFunction=smoaDistance file://$CWD/myOnto.owl file://$CWD/edu.mit.visus.bibtex.owl -t 0.5 -o results/SMOA5.rdf

See the output in RDF/XML or HTML (if rendered as before).

Other manipulations: It is possible to invert an alignment with the following command:

$ java -cp ../../../lib/procalign.jar fr.inrialpes.exmo.align.cli.ParserPrinter -i file:results/SMOA5.rdf -o results/AOMS5.rdf

See the output in RDF/XML or HTML (if rendered as before). The results is an alignment from the source to the target. Inverting alignment is only the exchange of the order of the elements in the alignment file. This can be useful when you have an alignment of A to B, an alignment from C to B and you want to go from A to C. The solution is then to invert the second alignment and to compose them.

More work: There is another switch (-T) in Procalign that specifies the way a threshold is applied (hard|perc|prop|best|span) the default being "hard". The curious reader can apply these and see the difference in results. How they work is explained in the Alignment API documentation.

More work: Try to play with the thresholds in order to find the best values for levenshteinDistance and smoaDistance.

Output

Once a good alignment has been found, only half of the work has been done. In order to actually use our result it is necessary to transform it into some processable format. For instance, if one wants to merge two OWL ontologies, the alignment can be changed into set of OWL "bridging" axioms. This is achieved by "rendering" the alignment in OWL (through the -r switch):

$ java -cp ../../../lib/procalign.jar fr.inrialpes.exmo.align.cli.ParserPrinter file:results/SMOA5.rdf -r fr.inrialpes.exmo.align.impl.renderer.OWLAxiomsRendererVisitor

The result is a set of OWL assertions of the form:

<owl:Class rdf:about="https://moex.gitlabpages.inria.fr/alignapi/tutorial/myOnto.owl#Techreport"> <owl:equivalentClass rdf:resource="https://moex.gitlabpages.inria.fr/alignapi/tutorial/edu.mit.visus.bibtex.owl#Techreport"/> </owl:Class> <owl:ObjectProperty rdf:about="https://moex.gitlabpages.inria.fr/alignapi/tutorial/myOnto.owl#copyright"> <owl:equivalentProperty rdf:resource="https://moex.gitlabpages.inria.fr/alignapi/tutorial/edu.mit.visus.bibtex.owl#hasCopyright"/> </owl:ObjectProperty>

If one wants to use the alignments only for infering on instances without actually merging the classes, she can generate SWRL rules:

$ java -cp ../../../lib/procalign.jar fr.inrialpes.exmo.align.cli.ParserPrinter file:results/SMOA5.rdf -r fr.inrialpes.exmo.align.impl.renderer.SWRLRendererVisitor

which brings for the same assertions:

<ruleml:imp> <ruleml:_body> <swrl:classAtom> <owllx:Class owllx:name="https://moex.gitlabpages.inria.fr/alignapi/tutorial/myOnto.owl#Techreport"/> <ruleml:var>x</ruleml:var> </swrl:classAtom> </ruleml:_body> <ruleml:_head> <swrlx:classAtom> <owllx:Class owllx:name="https://moex.gitlabpages.inria.fr/alignapi/tutorial/edu.mit.visus.bibtex.owl#Techreport"/> <ruleml:var>x</ruleml:var> </swrl:classAtom> </ruleml:_head> </ruleml:imp> <ruleml:imp> <ruleml:_body> <swrl:individualPropertyAtom swrlx:property="https://moex.gitlabpages.inria.fr/alignapi/tutorial/myOnto.owl#copyright"/> <ruleml:var>x</ruleml:var> <ruleml:var>y</ruleml:var> </swrl:individualPropertyAtom> </ruleml:_body> <ruleml:_head> <swrl:datavaluedPropertyAtom swrlx:property="https://moex.gitlabpages.inria.fr/alignapi/tutorial/edu.mit.visus.bibtex.owl#hasCopyright"/> <ruleml:var>x</ruleml:var> <ruleml:var>y</ruleml:var> </swrl:datavaluedPropertyAtom> </ruleml:_head> </ruleml:imp>

Exchanging data can also be achieved more simply through XSLT transformations which will transform the OWL instance files from one ontology to another:

$ java -cp ../../../lib/procalign.jar fr.inrialpes.exmo.align.cli.ParserPrinter file:results/SMOA5.rdf -r fr.inrialpes.exmo.align.impl.renderer.XSLTRendererVisitor -o results/SMOA5.xsl

this transformation can be applied to the data of data.xml:

$ xsltproc results/SMOA5.xsl data.xml > results/data.xml

for giving the results/data.xml file.

Evaluating

We will evaluate alignments by comparing them to some reference alignment which is supposed to express what is expected from an alignment of these two ontologies. The reference alignment is refalign.rdf (or HTML, if rendered as before).

For evaluating we use another class than Procalign. It is called EvalAlign we should specify this to java. By default, it computes precision, recall and associated measures. It can be invoked this way:

$ java -cp ../../../lib/procalign.jar fr.inrialpes.exmo.align.cli.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.PRecEvaluator file://$CWD/refalign.rdf file://$CWD/tutorial1/results/equal.rdf

The first argument is always the reference alignment, the second one is the alignment to be evaluated. The result is given here:

<?xml version='1.0' encoding='utf-8' standalone='yes'?> <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:map='http://www.atl.external.lmco.com/projects/ontology/ResultsOntology.n3#'> <map:output rdf:about=''> <map:input1 rdf:resource="https://moex.gitlabpages.inria.fr/alignapi/tutorial/myOnto.owl"/> <map:input2 rdf:resource="https://moex.gitlabpages.inria.fr/alignapi/tutorial/edu.mit.visus.bibtex.owl"/> <map:precision>1.0</map:precision> <map:recall>0.3541666666666667</map:recall> <fallout>0.0</fallout> <map:fMeasure>0.5230769230769231</map:fMeasure> <map:oMeasure>0.3541666666666667</map:oMeasure> <time>22</time> <result>0.3541666666666667</result> </map:output> </rdf:RDF>

Of course, since that method only match objects with the same name, it is accurate, yielding a high precision. However, it has poor recall.

We can now evaluate the edit distance. What to expect from the evaluation of this alignment?

Since it returns more correspondences by loosening the constraints for being a correspondence, it is expected that the recall will increase at the expense of precision.

We can see the results of:

$ java -cp ../../../lib/procalign.jar fr.inrialpes.exmo.align.cli.EvalAlign -i fr.inrialpes.exmo.align.impl.eval.PRecEvaluator file://$CWD/refalign.rdf file://$CWD/tutorial1/results/levenshtein33.rdf
<?xml version='1.0' encoding='utf-8' standalone='yes'?> <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:map='http://www.atl.external.lmco.com/projects/ontology/ResultsOntology.n3#'> <map:output rdf:about=''> <map:input1 rdf:resource="https://moex.gitlabpages.inria.fr/alignapi/tutorial/myOnto.owl"/> <map:input2 rdf:resource="https://moex.gitlabpages.inria.fr/alignapi/tutorial/edu.mit.visus.bibtex.owl"/> <map:precision>0.6486486486486487</map:precision> <map:recall>1.0</map:recall> <fallout>0.35135135135135137</fallout> <map:fMeasure>0.7868852459016393</map:fMeasure> <map:oMeasure>0.4583333333333335</map:oMeasure> <result>1.5416666666666665</result> </map:output> </rdf:RDF>

It is possible to summarize these results by comparing them to each others. This can be achieved by the GroupEval class. This class can output several formats (by default html) and takes all the alignments in the subdirectories of the current directory. Here we only have the results directory:

$ cp ../refalign.rdf results $ java -cp ../../../lib/procalign.jar fr.inrialpes.exmo.align.cli.GroupEval -r refalign.rdf -l "refalign,equal,SMOA,SMOA5,levenshtein,levenshtein33" -c -f prm -o results/eval.html

The results are displayed in the results/eval.html file whose main content is the table:

algo refalign equal SMOA SMOA5 levenshtein levenshtein33
test Prec. Rec. FMeas. Prec. Rec. FMeas. Prec. Rec. FMeas. Prec. Rec. FMeas. Prec. Rec. FMeas. Prec. Rec. FMeas.
results 1.00 1.00 1.00 1.00 0.35 0.52 0.57 0.98 0.72 0.72 0.98 0.83 0.55 1.00 0.71 0.65 1.00 0.79
H-mean1.00 1.00 1.00 1.00 0.35 0.52 0.57 0.98 0.72 0.72 0.98 0.83 0.55 1.00 0.71 0.65 1.00 0.79

n/a: result alignment not provided or not readable
NaN: division per zero, likely due to empty alignment.

More work: As you can see, the PRecEvaluator does not only provide precision and recall but also provides F-measure. F-measure is usually used as an "absolute" trade-off between precision and recall (i.e., the optimum F-measure is considered the best precision and recall). Can you establish this point for SMOA and levenshtein and tell which algorithm is more adapted?

Further exercises

More info: https://moex.gitlabpages.inria.fr/alignapi/tutorial/


https://moex.gitlabpages.inria.fr/alignapi/tutorial/tutorial1/