Manipulating alignments in Java programs (and embedding the Alignment API within an application)

This version:: https://moex.gitlabpages.inria.fr/alignapi/tutorial/tutorial2/
Author:: Jérôme Euzenat, INRIA & Univ. Grenoble Alpes

Here is a tutorial for embedding the alignment API within your own applications.

This tutorial has been designed for the Alignment API version 4.0.

Of course, the goal of the Alignment API is not to be used at the command line level (even if it can be very useful). So if you are ready for it, you can develop in Java your own application that takes advantage of the API.

Starting point

A skeleton of program using the Alignment API is Skeleton.java. It can be compiled by invoking:

$ javac -classpath ../../../lib/align.jar:../../../lib/procalign.jar -d results Skeleton.java

and run by:

$ java -cp ../../../lib/Procalign.jar:results Skeleton file://$CWD/myOnto.owl file://$CWD/edu.mit.visus.bibtex.owl

Now considering the API (that can be consulted through its thin Javadoc for instance), the goal is to modify the Skeleton program so that it performs the following:

Run two different alignment methods (e.g., ngram distance and smoa);
Merge the two results;
Trim at various thresholds;
Evaluate them against the reference alignment and choose the one with the best F-Measure;
Displays it as OWL axioms.

Of course, you can do it progressively.

Call an alignment method

Matching two ontologies is achieved with three steps:

creating an instance of the class implementing the expected method (implementing the AlignmentProcess interface);
providing the two ontologies to match to this instance (init);
calling the matching method (align).

The matching method takes two arguments: an eventual alignment to improve on (which can be null) a set of parameters. So, the three lines below match ontologies onto1 and onto2 with the StringDistAlignment matcher:

AlignmentProcess a = new StringDistAlignment(); a.init ( onto1, onto2 ); a.align( (Alignment)null, new Properties() );

Now, the first step is to run two different instances of StringDistAlignment with different stringFunction parameters corresponding to "smoaDistance" and "ngramDistance".

Properties params = new Properties(); // Run two different alignment methods (e.g., ngram distance and smoa) AlignmentProcess a1 = new StringDistAlignment(); params.setProperty("stringFunction","smoaDistance"); a1.init ( onto1, onto2 ); a1.align( (Alignment)null, params ); AlignmentProcess a2 = new StringDistAlignment(); a2.init ( onto1, onto2 ); params = new Properties(); params.setProperty("stringFunction","ngramDistance"); a2.align( (Alignment)null, params );

After this step, the two matching methods have been processed and the result is available within the alignment instances (a1 and a2).

Manipulate alignments (merge and trim)

Alignments offer methods to manipulate these alignments. In particular, it is possible to

clone alignments (clone()),
invert alignments (change the order of ontologies, inverse()),
merge alignments (put all their correspondences together, ingest(Alignment)),
trim them under various threshold and threshold modalities (cut(threshold)).

Of these, cloning and inverting creates a new alignment while the other operations modify the alignment.

The goal of the excercise is to create a copy of alignment a1, merge it with a2 to invert the order of their ontologies and to finally trim the result under a threshold of .5. At each step, display the number of corresondences in the resulting alignment.

// Clone a1 System.err.println( a1.nbCells() ); BasicAlignment a3 = (BasicAlignment)(a1.clone()); System.err.println( a3.nbCells() ); // Merge the two results. a3.ingest( a2 ); System.err.println( a3.nbCells() ); // Invert the alignement Alignment a4 = a3.inverse(); System.err.println( a4.nbCells() ); // Trim above .5 a4.cut( .5 ); System.err.println( a4.nbCells() );

More work: You can consider the more elaborate versions of cut() and compare the results.

Evaluating alignments

Alignments can also be evaluated. For that purpose, the API provides the Evaluator interface. Similarly, to AlignmentProcess, this interface is called by:

creating an instance of a particular Evaluator taking as argument a reference alignment and the alignment to evaluate;
processing the evaluation (eval()) eventually with a parameter.

Below the provided code first creates a parser for loading the reference alignment, then creates an instance of PRecEvaluator for computing precision and recall between the alignment a1 above with respects to the reference alignment.

// Load the reference alignment AlignmentParser aparser = new AlignmentParser(0); Alignment reference = aparser.parse( new File( "../refalign.rdf" ).toURI() ); Evaluator evaluator = new PRecEvaluator( reference, a1 ); evaluator.eval();

As previously, results are stored within the Evaluator object and are accessed through specific accessors.

As an excercise, one could try to trim the alignment a1 with thresholds of 0., .2, .4, .6, .8, and 1., to evaluate these results for precision and recall and to select the one with the best F-measure.

// Trim at various thresholds // Evaluate them against the references // and choose the one with the best F-Measure double best = 0.; Alignment result = null; Properties p = new Properties(); for ( int i = 0; i <= 10 ; i += 2 ){ a1.cut( ((double)i)/10 ); // This operation must be repeated because the modifications in a1 // are not taken into account otherwise Evaluator evaluator = new PRecEvaluator( reference, a1 ); evaluator.eval( p ); System.err.println("Threshold "+(((double)i)/10)+" : "+((PRecEvaluator)evaluator).getFmeasure()+" over "+a1.nbCells()+" cells"); if ( ((PRecEvaluator)evaluator).getFmeasure() > best ) { result = (BasicAlignment)((BasicAlignment)a1).clone(); best = ((PRecEvaluator)evaluator).getFmeasure(); } }

Displaying an alignment

Finally, alignments can be displayed in a variety of formats through the AlignmentVisitor abstraction. Alignment are displayed by:

creating a PrintWriter in which the visitor will print,
creating the AlignmentVisitor on this writer, and
rendering the alignment (through the render() metthod).

For instance, it is possible to print on the standard output the alignment selected at the previous exercise as a set of OWL axioms.

// Displays it as OWL Rules PrintWriter writer = new PrintWriter ( new BufferedWriter( new OutputStreamWriter( System.out, "UTF-8" )), true); AlignmentVisitor renderer = new OWLAxiomsRendererVisitor(writer); result.render(renderer); writer.flush(); writer.close();

Putting these together

Do you want to see a possible solution?

The main piece of code in Skeleton.java is replaced by:

// Run two different alignment methods (e.g., ngram distance and smoa) AlignmentProcess a1 = new StringDistAlignment(); params.setProperty("stringFunction","smoaDistance"); a1.init ( onto1, onto2 ); a1.align( (Alignment)null, params ); AlignmentProcess a2 = new StringDistAlignment(); a2.init ( onto1, onto2 ); params = new Properties(); params.setProperty("stringFunction","ngramDistance"); a2.align( (Alignment)null, params ); // Merge the two results. ((BasicAlignment)a1).ingest(a2); // Load the reference alignment AlignmentParser aparser = new AlignmentParser(0); // Changed by Angel for Windows Alignment reference = aparser.parse( new File( "../refalign.rdf" ).toURI() ); // Trim at various thresholds // Evaluate them against the references // and choose the one with the best F-Measure double best = 0.; Alignment result = null; Properties p = new Properties(); for ( int i = 0; i <= 10 ; i += 2 ){ a1.cut( ((double)i)/10 ); // This operation must be repeated because the modifications in a1 // are not taken into account otherwise Evaluator evaluator = new PRecEvaluator( reference, a1 ); evaluator.eval( p ); System.err.println("Threshold "+(((double)i)/10)+" : "+((PRecEvaluator)evaluator).getFmeasure()+" over "+a1.nbCells()+" cells"); if ( ((PRecEvaluator)evaluator).getFmeasure() > best ) { result = (BasicAlignment)((BasicAlignment)a1).clone(); best = ((PRecEvaluator)evaluator).getFmeasure(); } } // Displays it as OWL Rules PrintWriter writer = new PrintWriter ( new BufferedWriter( new OutputStreamWriter( System.out, "UTF-8" )), true); AlignmentVisitor renderer = new OWLAxiomsRendererVisitor(writer); result.render(renderer); writer.flush(); writer.close();

This can be compiled and used through:

$ javac -classpath ../../../lib/align.jar:../../../lib/procalign.jar -d results MyApp.java $ java -cp ../../../lib/Procalign.jar:results MyApp file://$CWD/myOnto.owl file://$CWD/edu.mit.visus.bibtex.owl > results/MyApp.owl

The execution provides an insight about the best threshold:

Threshold 0.0 : 0.4999999999999999 over 140 cells
Threshold 0.2 : 0.5529411764705882 over 122 cells
Threshold 0.4 : 0.5802469135802468 over 114 cells
Threshold 0.6 : 0.6861313868613137 over 89 cells
Threshold 0.8 : 0.7692307692307693 over 69 cells
Threshold 1.0 : 0.5230769230769231 over 17 cells

A full working solution is MyApp.java.

Advanced question: Can you tell why the stored alignment does not seem to contain 69 cells? (Hint: try to render the alignments in RDF and see what happens)

More work: You can add a switch like the -i switch of Procalign so that the main class of the application can be passed at commant-line.

Advanced: What about writing an editor for the alignment API?

Further exercises

More info: https://moex.gitlabpages.inria.fr/alignapi/tutorial/

https://moex.gitlabpages.inria.fr/alignapi/tutorial/tutorial2/