Here is a
tutorial for embedding the alignment API within your
own applications.
This tutorial has been designed for the Alignment API version 4.0.
Of course, the goal of the Alignment API is not to be used at the command line level (even if it can be very useful). So if you are ready for it, you can develop in Java your own application that takes advantage of the API.
Starting point
A skeleton of program using the Alignment API is Skeleton.java. It can be compiled by invoking:
Now considering the API (that can be consulted through its
thin Javadoc
for instance), the goal is to modify the Skeleton program so that it performs the following:
Run two different alignment methods (e.g., ngram distance and smoa);
Merge the two results;
Trim at various thresholds;
Evaluate them against the reference alignment and choose the one with the best F-Measure;
Displays it as OWL axioms.
Of course, you can do it progressively.
Call an alignment method
Matching two ontologies is achieved with three steps:
creating an instance of the class implementing the expected method
(implementing the AlignmentProcess interface);
providing the two ontologies to match to this instance (init);
calling the matching method (align).
The matching method takes two arguments: an eventual alignment to
improve on (which can be null) a set of parameters. So, the three
lines below match ontologies onto1 and onto2 with
the StringDistAlignment matcher:
AlignmentProcess a = new StringDistAlignment();
a.init ( onto1, onto2 );
a.align( (Alignment)null, new Properties() );
Now, the first step is to run two different instances
of StringDistAlignment with different stringFunction
parameters corresponding to "smoaDistance" and "ngramDistance".
Properties params = new Properties();
// Run two different alignment methods (e.g., ngram distance and smoa)
AlignmentProcess a1 = new StringDistAlignment();
params.setProperty("stringFunction","smoaDistance");
a1.init ( onto1, onto2 );
a1.align( (Alignment)null, params );
AlignmentProcess a2 = new StringDistAlignment();
a2.init ( onto1, onto2 );
params = new Properties();
params.setProperty("stringFunction","ngramDistance");
a2.align( (Alignment)null, params );
After this step, the two matching methods have been processed and the
result is available within the alignment instances (a1
and a2).
Manipulate alignments (merge and trim)
Alignments offer methods to manipulate these alignments. In
particular, it is possible to
clone alignments (clone()),
invert alignments (change the order of ontologies, inverse()),
merge alignments (put all their correspondences together, ingest(Alignment)),
trim them under various threshold and threshold modalities (cut(threshold)).
Of these, cloning and inverting creates a new alignment while the
other operations modify the alignment.
The goal of the excercise is to create a copy of
alignment a1, merge it with a2 to invert the
order of their ontologies and to finally trim the result under a
threshold of .5. At each step, display the number of
corresondences in the resulting alignment.
More work: You can consider the more
elaborate versions of cut() and compare the results.
Evaluating alignments
Alignments can also be evaluated. For that purpose, the API provides
the Evaluator interface. Similarly,
to AlignmentProcess, this interface is called by:
creating an instance of a particular Evaluator taking as
argument a reference alignment and the alignment to evaluate;
processing the evaluation (eval()) eventually with a parameter.
Below the provided code first creates a parser for loading the
reference alignment, then creates an instance
of PRecEvaluator for computing precision and recall between
the alignment a1 above with respects to the reference alignment.
// Load the reference alignment
AlignmentParser aparser = new AlignmentParser(0);
Alignment reference = aparser.parse( new File( "../refalign.rdf" ).toURI() );
Evaluator evaluator = new PRecEvaluator( reference, a1 );
evaluator.eval();
As previously, results are stored within the Evaluator object
and are accessed through specific accessors.
As an excercise, one could try to trim the alignment a1 with
thresholds of 0., .2, .4, .6, .8, and 1., to evaluate these results
for precision and recall and to select the one with the best
F-measure.
// Trim at various thresholds
// Evaluate them against the references
// and choose the one with the best F-Measure
double best = 0.;
Alignment result = null;
Properties p = new Properties();
for ( int i = 0; i <= 10 ; i += 2 ){
a1.cut( ((double)i)/10 );
// This operation must be repeated because the modifications in a1
// are not taken into account otherwise
Evaluator evaluator = new PRecEvaluator( reference, a1 );
evaluator.eval( p );
System.err.println("Threshold "+(((double)i)/10)+" : "+((PRecEvaluator)evaluator).getFmeasure()+" over "+a1.nbCells()+" cells");
if ( ((PRecEvaluator)evaluator).getFmeasure() > best ) {
result = (BasicAlignment)((BasicAlignment)a1).clone();
best = ((PRecEvaluator)evaluator).getFmeasure();
}
}
Displaying an alignment
Finally, alignments can be displayed in a variety of formats
through the AlignmentVisitor abstraction. Alignment are
displayed by:
creating a PrintWriter in which the visitor will print,
creating the AlignmentVisitor on this writer, and
rendering the alignment (through the render() metthod).
For instance, it is possible to print on the standard output the
alignment selected at the previous exercise as a set of OWL axioms.
// Displays it as OWL Rules
PrintWriter writer = new PrintWriter (
new BufferedWriter(
new OutputStreamWriter( System.out, "UTF-8" )), true);
AlignmentVisitor renderer = new OWLAxiomsRendererVisitor(writer);
result.render(renderer);
writer.flush();
writer.close();
Putting these together
Do you want to see a possible solution?
The main piece of code in Skeleton.java is replaced by:
// Run two different alignment methods (e.g., ngram distance and smoa)
AlignmentProcess a1 = new StringDistAlignment();
params.setProperty("stringFunction","smoaDistance");
a1.init ( onto1, onto2 );
a1.align( (Alignment)null, params );
AlignmentProcess a2 = new StringDistAlignment();
a2.init ( onto1, onto2 );
params = new Properties();
params.setProperty("stringFunction","ngramDistance");
a2.align( (Alignment)null, params );
// Merge the two results.
((BasicAlignment)a1).ingest(a2);
// Load the reference alignment
AlignmentParser aparser = new AlignmentParser(0);
// Changed by Angel for Windows
Alignment reference = aparser.parse( new File( "../refalign.rdf" ).toURI() );
// Trim at various thresholds
// Evaluate them against the references
// and choose the one with the best F-Measure
double best = 0.;
Alignment result = null;
Properties p = new Properties();
for ( int i = 0; i <= 10 ; i += 2 ){
a1.cut( ((double)i)/10 );
// This operation must be repeated because the modifications in a1
// are not taken into account otherwise
Evaluator evaluator = new PRecEvaluator( reference, a1 );
evaluator.eval( p );
System.err.println("Threshold "+(((double)i)/10)+" : "+((PRecEvaluator)evaluator).getFmeasure()+" over "+a1.nbCells()+" cells");
if ( ((PRecEvaluator)evaluator).getFmeasure() > best ) {
result = (BasicAlignment)((BasicAlignment)a1).clone();
best = ((PRecEvaluator)evaluator).getFmeasure();
}
}
// Displays it as OWL Rules
PrintWriter writer = new PrintWriter (
new BufferedWriter(
new OutputStreamWriter( System.out, "UTF-8" )), true);
AlignmentVisitor renderer = new OWLAxiomsRendererVisitor(writer);
result.render(renderer);
writer.flush();
writer.close();
Advanced question: Can you tell why the stored alignment does not seem to contain 69 cells? (Hint: try to render the alignments in RDF and see what happens)
More work: You can add a switch like the -i switch of Procalign so that the main class of the application can be passed at commant-line.
Advanced: What about writing an editor for the alignment API?