Rulesets

The lower level of Transmorpher, indeed implementing a transformation, is made of rules (corresponding to XSLT templates). Instead of using only one kind of very general template, Transmorpher provides a collection of very simple-to-use and easy-to-analyze rules.

The advantages of rules are that:

Rules

The rules themselves are very simple templates that can be used for specifying simple transformations of a source. Here is the current set of rules:

<maptag match='tag1' target='tag2' />

maps a particular tag (tag1) to another one (tag2). It can contain remtag, maptag, mapatt, rematt or addatt tags for modifying the content of the matched tags. This is useful for straightforward DTD translation.

<maptag match='reference' target='bibitem' context='bibliography'/>

transforms all the reference elements in the context of a bibliography element in a bibitem element.

<remtag match='tag' />

simply removes the subtrees rooted in the specified tag. This is useful for simplifying the structure of the document.

<remtag match="abstract" context="reference"/>

suppresses all abstract elements (and all their content) in the context of a reference element.

<flatten match='tag' />

for replacing a tree by its subtrees (or text) in a structure (this rearrangement is useful for information gathered from multiple sources).

<flatten match='bibliography'/>

suppresses all the bibliography elements within another bibliography element (but not their content).

<mapatt match='name1' target='name2' />

maps a particular attribute to another one.

<mapatt match='issue' target='number' context='conference'/>

transforms each issue attribute in the context of a conference element in a number attribute.

<rematt name='name' />

simply remove the attributes specified by name.

<rematt match="status"/> <rematt match="isbn" context="book"/>

removes all status attributes and all ISBN attributes in the context of a book element.

<addatt name='name' value='value' />

adds to the considered elements a particular attribute and value. This enables operations like decoration of a tree with specific primitives. This is useful for decorating a subtree with a particular attribute (e.g. to color all level 1 titles in red).

<addatt name="color" value="red"/>
<resubst match='path' source='re' target='ss' />

substitutes each occurrence of the regular expression re in the content of the context path (usually the value of an attribute or the non structured content of an element) by the substitution string ss (which can refer to extracted fragments).

<resubst match="conference/@issue" source="([0-9]+)" target="$1e"/>

substitutes each number by the same number followed by "e" in the context of the issue attribute of a conference element. resubst is not pure XSLT. It is implemented as a Xalan extension function and uses the gnu.regexp package. It might become a standard feature of XSLT 2.0 (or a standard extension function of XSLT 2.0 implemented with standard Java 1.4 regexp substitution).

All these rule tags can use the context attribute, which enables the restriction of the evaluation context of a rule with an XPath location.

Adding new rule types

Adding new rule types is not offered yet by Transmorpher but could be in the future

Rulesets

The rule constructs are grouped into rulesets (corresponding to XSLT stylesheets and which can contain regular XSLT templates). The goal of these rulesets is the same as XSLT stylesheets with a restricted set of actions.
The rulesets have one implicit input and one implicit output.

In the transformation flow initially provided, there is the strip-abstract ruleset suppressing the abstract elements (and other minor elements) and the elements marked as private.

<ruleset name="stripAbstract"> <remtag match="abstract" context="reference"/> <remtag match="keywords" context="reference"/> <remtag match="areas" context="reference"/> <remtag match="softwares" context="reference"/> <remtag match="contracts" context="reference"/> <remtag match="*[@status='hidden']"/> <resubst match="conference/@issue" source="([0-9]+)" target="$1e"/> <rematt match="status"/> <rematt match="isbn" context="book"/> </ruleset>

 [ ruleset icon ]

Applying ruleset

The apply-ruleset element introduces the use of a ruleset in a process body. Its structure is the following:

<apply-ruleset ref="name" id="id" in="channel" out="channel"> {<with-param>}* </apply-ruleset>

The channels, implicit in the writing of the rules, must named when applying a rule set:

<apply-ruleset ref="stripAbstract" id="StripAbstract" in="D1" out="X1"/>

The ruleset control scheme is exactly the same as that of XSLT: one-pass top-down evaluation. The apply-ruleset tag has a strategy attribute in which we plan to specify other evaluation strategies.

The ruleset implementation consists of transforming the ruleset in a stylesheet that is processed by Xalan. It is easy to see how these rulesets can be transformed into a proper XSLT stylesheet.

The XSLT stylesheet corresponding to the ruleset above is given below:

<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:regexp="xalan://fr.fluxmedia.transmorpher.regexp.RegularExpression"> <!-- ************************************************************ --> <!-- This style sheet was generated by Transmorpher --> <!-- ************************************************************ --> <!-- Copying the root and its attributes --> <xsl:template match="/"> <xsl:apply-templates/> </xsl:template> <xsl:template match="@*"> <xsl:attribute name="{name()}"><xsl:value-of select="."/> </xsl:attribute> </xsl:template> <!-- Copying all elements and attributes --> <xsl:template match="*"> <xsl:copy> <xsl:apply-templates select="*|@*|text()"/> </xsl:copy> </xsl:template> <!-- ************************************************************ --> <!-- End of the general section, here begins the stylesheet --> <!-- ************************************************************ --> <!-- Removing elements abstract --> <xsl:template match="reference/abstract"/> <!-- Removing elements keywords --> <xsl:template match="reference/keywords"/> <!-- Removing elements areas --> <xsl:template match="reference/areas"/> <!-- Removing elements softwares --> <xsl:template match="reference/softwares"/> <!-- Removing elements contracts --> <xsl:template match="reference/contracts"/> <!-- Removing elements *[@status='hidden'] --> <xsl:template match="*[@status='hidden']"/> <!-- Substituting all ([0-9]+) by $1e in conference/@issue --> <xsl:template match ="conference/@issue"> <xsl:if test="function-available('regexp:substitute') and function-available('regexp:substituteAll') "> <xsl:attribute name="issue"> <xsl:value-of select="regexp:substituteAll(.,'([0-9]+)','$1e')"/> </xsl:attribute> </xsl:if> </xsl:template> <!-- Removing attributes status --> <xsl:template match="@status"/> </xsl:stylesheet>

Strategies

The strategic aspect of rulesets is not implemented yet. Currently, since the rulesets are compiled as XSLT transformations, the default strategy can be considered as "outermost-once-with-explicit-calls".

Next chapter: Channels


Feel free to comment to Jérôme:Euzenat#inrialpes:fr, $Id: rulesets.html,v 1.6 2005-10-25 13:34:00 euzenat Exp $