Computing model

Transformation flows are made of sets of transformations connected by channels on their input/output ports. Transformations can in turn be either transformation flows or elementary transformations. Channels carry the information to be transformed (currently, only XML-formatted under the form of SAX events [boag2000a]). They can take several inputs and provide several outputs during one execution. The Transmorpher computing model is thus rather simple.

Transmorpher enables the description transformation flows in XML. It also defines a set of abstract elementary transformations that are provided with an interface and execution model. Currently, the available transformation abstractions are: generators, serializers, rule set processors, dispatchers, mergers, query evaluators, external processing calls and iterators.

The interpretation of a transformation flow consists of creating the transformations, connecting them through channels and providing input to the source input channels. This interpretation can be triggered at the shell level, embedded in another application, triggered as an Ant task or delivered as a servlet.

Transmorpher is thus made of two main parts: a set of documented Java classes (which can be refined and integrated in other software) and an interpreter of transformation flows. The transformation flows can be specified by programming the class instantiation in Java, by composing it in a graphic interface or by describing it in XML.

Processes

The transformation flows are described in an XML document which clearly separates the rules from the processing. The transformation flows are described through process elements. There can be several such processes in one document.

The process content is call the process body. It contains a set of subprocesses, whose main types are:

<apply-process name='name'/>: which calls an already defined process,
<apply-ruleset name='name' strategy='strategy' />: which applies a set of rules (equivalent to an XSLT stylesheet) to its input,
<apply-external type='type' file='file' />: which calls an external procedure on the input and must provide the output. This engine could be Perl, XSLT, or whatever is appropriate.
<apply-query name='name' type='type' file='file' />: which evaluates a query on the input and must provide the output. This query engine could be XQL, SQL, or whatever is appropriate.
<repeat bin='channels' tests='channels'/>: which applies the contained treatment a particular number of times (or until the input and output are the same). Buffering channels are provided for expressing the information flow.;

Other instrumental subprocesses are:

<dispatch type='type'/>: which takes one input and several outputs,
<merge type='type'/>: which takes several input and one output,
<generate type='type'/>: which takes no input and one output (generally used to read from outer streams like files),
<serialize type='type'/>: which takes one input and no output (generally used to write to outer streams like files).

Each of these primitives has an id (enabling the identification of subprocesses of the same kind) and in and out attributes (enabling their connection to other processes). The name attribute denotes an element defined within the current transformation (or an imported one). The type attribute identifies a particular implementation of the basic process.

In the example 2 is a Transmorpher transformation flow, in which the processGeneral process corresponds to the flow described by the figure above.

Channels

In Transmorpher the generic processes can have several input and several output port. These ports are connected to channels that are fed in by the output of a process and can be used as input of other processes. They are abstractions that enables the expression of the flow of information in a compound transformation and not the mark of a particular implementation. The set of channels is called the dataflow.

The channels specify a unit in which processes can read and write. They are named streams which can be visible from outside a process if they are declared as their input or output.

The control inside a process can be deduced from the dataflow. There is no explicit operator for parallelizing or composing transformations: their channels denote composition, precedence and independence of processes.

Alternative solutions to channels, could have been retained in order to deal uniformly with input/output. The solution taken by [drewes2000a] consists of considering each transformation as a function from one (not necessarily connected) graph to another. This solution has the advantage of using functions and sticking to the initial XSLT design but it does not preserve the order between the graphs. It can be replaced by the nodeset notion of XSLT/XPath. However, on the application side, the objects to be manipulated are documents and not graphs or node sets, so we kept on modeling multiple input/output transformations.

The channels are currently implemented by SAX2 event flows, with all data being encoded in UTF-8. They thus can only carry XML data (which can be text). Very few verifications are done so far on the channels.

Parameters

Parameters allow to parameterize the behavior of the transformation without XML documents. They are declared by the element param which can appear at the begining (first position) of process bodies, queries and rulesets. They are passed from one process to a call through the use of with-param elements. Finally, they are evaluated withing the string fields as dollar prefixed strings ("$aa", "${aa}" both stand for the variable named aa, though "\$aa" stands for the string "$aa").

Parameter values are string. They can be used by the external programs that are called by Transmorpher.

Next chapter: Transmorpher transformations

Feel free to comment to Jérôme:Euzenat#inrialpes:fr, $Id: model.html,v 1.5 2005-10-25 13:34:00 euzenat Exp $