Achieving Extensibility and Reuse for XSLT2.0 Stylesheets

Matthew Fuchs

Abstract

The appearance of the element() predicate and the ability to return source tree nodes from templates provides XSLT2 with mechanisms to support extensibility and reuse in ways not available in XSLT1. We present a significant example of these involving both vertical and horizontal changes, where a single stylesheet exploiting these features can be applied to incompatible but similar versions of a schema as well as to derived schemas created through using XSD's inheritance mechanisms.

Keywords: Schema Languages; XSLT; XSD/W3C Schema

Matthew Fuchs

Matthew Fuchs, PhD, is senior architect at Westbridge Technology. Before that, he was chief scientist for XML related technologies at CommerceOne, an industry leader in electronic commerce. He co-authored the "Schema for Object Oriented XML" and designed its object-oriented features (and the software that exploits them). He received his Ph.D. from NYU in 1995, where his work on mobile object systems started his fixation on using XML (and its SGML predecessor) as a metalanguage for describing agent communication languages. Dr. Fuchs was a founding member of the W3C working group that created XML and is a member of the XML Schema Working Group before CommerceOne, he was a researcher at Walt Disney Imagineering and at WVU's Concurrent Engineering Research Center.

Achieving Extensibility and Reuse for XSLT2.0 Stylesheets

Matthew Fuchs [Senior Architect; Westbridge Technology]

Extreme Markup Languages 2004® (Montréal, Québec)

Copyright © Matthew D. Fuchs 2004. Reproduced with permission.

Introduction

Code reuse and extensibility are a constant concern of developers in all areas, including markup. Reuse and extensibility simplify projects shorten development times and please customers, but remain an elusive goal. Controversial though they may be, the W3C XML Schema Definition Language's (XSD)[XSD] inheritance (complexType derivation and element substitution groups) and composition (import) mechanisms are borrowed from software engineering and language design to support reuse and extensibility. Exploiting these XSD features has lagged because of a lack of tool support outside of tools to generate program components. In particular, XSLT1.0[XSLT1], perhaps the most used "native" XML processing language, has no facilities to exploit these features - two elements in the same substitution group have no relationship to each other, and the type of an element frequently doesn't even appear in the document. However XSLT2[XSLT2] and XPath2[XPATH] change the situation by providing new features.

We will examine two new features for XSLT2 - the element() and the ability to return source trees from a variety of constructs - and show how these together with XSD's features provide robust support for certain patterns of versioning and reuse. We will apply these in two scenarios. In the first, we will have a schema derived from another using XSD's inheritance mechanisms. Despite the different element structure of the two schemas, we will show how the a stylesheet can be designed to apply, unchanged, to documents for both schemas. In the second example, we have two very similar, but syntactically unrelated schemas. Because XSLT2 allows us to return source tree fragments from templates, we can abstract out document traversal. This allows us to create reusable generic stylesheets where the difference among schemas is handled by schema-specific accessor templates. By combining both techniques we get a stylesheet applicable to documents from all three schemas.

In order to provide a realistic setting for this effort, we will start with a reasonably large existing example, based on UBL[UBL]. We have a set of preexisting schemas, document, and stylesheet. The schemas are the UBL 0.7[UBL07] and 1.0beta[UBL1B] releases, the stylesheet is one developed by Ken Holman[HOLMAN] for the 0.7 release, and the document is a sample provided with UBL. We will add a derived schema to the UBL 0.7 release, and will modify the stylesheet using the appropriate XSLT2 features so it straightforwardly supports the 0.7 schema, the 1.0beta schema, and the derived schema. This multischema support is not possible in XSLT1. For an example of the derived schema, we will rewrite the example document into one for the derived schema.

UBL is a particularly interesting use case, as it was designed to support reusability of the first sort, although not the second. Our results will demonstrate the effectiveness of UBL for creating derived and versioned schemas. It is interesting to note that neither extensibility nor reuse are goals of XSLT2[XREQS]

The UBL distribution includes several schemas that use each other in a layered fashion. We are interested in the Order schema which describes a purchase order. It heavily relies on a set of reusable components found in the CommonAggregateTypes module, for which we will use the prefix "cat", except when mixing versions 0.7 and 1.0beta, where we will use "cat-0.7" and "cat-1.0beta" respectively. The structure of the 1.0beta release is very similar to the 0.7 release.

This is certainly not intended as the last word on this topic; we have specifically provided an OO pattern reliant on awareness of XSD extensions. There are other situations, such as when a processor won't accept additional schema information, or is not using XSD as a schema language, not covered here. Also, we expose some weaknesses in the XSD "story" - such as not allowing locals to be in substitution groups - as well as in the current draft of XSLT2 - not making "element" and "type" axes for XPath.

Supporting Derived Schemas

Figure 1 shows the UBL schema definition for OrderLine, Item and their associated types. Due to the UBL approach, each type contains numerous global elements, many of which may not be required in a specialization. The UBL 0.7 distribution provides several example documents. One, called JoineryOrder.xml, includes the fragment in fig. 2. The first thing we note is that many of the elements defined in fig 1 are not needed. Also significant, much of the semantically interesting information, such as the value of AttributeID, is found in content, rather than markup, although it is likely to repeat, and hence leaving as content is an error-prone risk.

Figure 1
<xsd:element name="OrderLine" type="OrderLineType"/>
<xsd:complexType name="OrderLineType" id="UBL000377">
    <xsd:sequence>
        <xsd:element ref="BuyersID" id="UBL000378"/>
        <xsd:element ref="SellersID" id="UBL000379" minOccurs="0"/>
        <xsd:element ref="LineExtensionAmount" id="UBL000380" minOccurs="0"/>
        <xsd:element ref="Quantity" id="UBL000381" minOccurs="0"/>
        <xsd:element ref="MinimumQuantity" id="UBL000382" minOccurs="0"/>
        <xsd:element ref="MaximumQuantity" id="UBL000383" minOccurs="0"/>
        <xsd:element ref="MaximumBackorderQuantity" id="UBL000384" minOccurs="0"/>
        <xsd:element ref="MinimumBackorderQuantity" id="UBL000385" minOccurs="0"/>
        <xsd:element ref="SubstitutionStatusCode" id="UBL000386" minOccurs="0"/>
        <xsd:element ref="DestinationParty" id="UBL000387" minOccurs="0"/>
        <xsd:element ref="Item" id="UBL000388"/>
        <xsd:element ref="DeliveryRequirement" id="UBL000389"/>
        <xsd:element ref="OrderedShipment" id="UBL000390" minOccurs="0" maxOccurs="unbounded"/>
        <xsd:element ref="AllowanceCharge" id="UBL000391" minOccurs="0" maxOccurs="unbounded"/>
        <xsd:element ref="BasePrice" id="UBL000392" minOccurs="0"/>
        <xsd:element ref="AlternativeOrderLine" id="UBL000393" minOccurs="0" maxOccurs="unbounded"/>
        <xsd:element ref="SubstituteForOrderLine" id="UBL000394" minOccurs="0" maxOccurs="unbounded"/>
     </xsd:sequence>
</xsd:complexType>
<xsd:element name="Item" type="ItemType"/>
<xsd:complexType name="ItemType" id="UBL000285">
    <xsd:sequence>
        <xsd:element ref="ID" id="UBL000286"/>
        <xsd:element ref="Description" id="UBL000287" minOccurs="0"/>
        <xsd:element ref="PackQuantity" id="UBL000288" minOccurs="0"/>
        <xsd:element ref="PackSizeQuantity" id="UBL000289" minOccurs="0"/>
        <xsd:element ref="FromCatalogIndicator" id="UBL000290" minOccurs="0"/>
        <xsd:element ref="BuyersItemIdentification" id="UBL000291" minOccurs="0"/>
        <xsd:element ref="SellersItemIdentification" id="UBL000292" minOccurs="0"/>
        <xsd:element ref="ManufacturersItemIdentification" id="UBL000293" minOccurs="0"/>
        <xsd:element ref="StandardItemIdentification" id="UBL000294" minOccurs="0"/>
        <xsd:element ref="CatalogueItemIdentification" id="UBL000295" minOccurs="0"/>
        <xsd:element ref="ReferencedCatalogue" id="UBL000296" minOccurs="0"/>
        <xsd:element ref="OriginCountry" id="UBL000297" minOccurs="0"/>
        <xsd:element ref="CommodityClassification" id="UBL000298" minOccurs="0"/>
        <xsd:element ref="SalesConditions" id="UBL000299" minOccurs="0" maxOccurs="unbounded"/>
        <xsd:element ref="HazardousItem" id="UBL000300" minOccurs="0" maxOccurs="unbounded"/>
        <xsd:element ref="Tax" id="UBL000301" minOccurs="0" maxOccurs="unbounded"/>
        <xsd:element ref="BasePrice" id="UBL000302" maxOccurs="unbounded"/>
   </xsd:sequence>
</xsd:complexType>
Figure 2
<cat:OrderLine>
    <cat:BuyersID>A</cat:BuyersID>
    <cat:SellersID/>
    <cat:Quantity unitCode="unit">2</cat:Quantity>
    <cat:Item>
      <cat:ID></cat:ID>
      <cat:SellersItemIdentification>
        <cat:ID>236WV</cat:ID>
        <cat:PhysicalAttribute>
          <cat:AttributeID>wood</cat:AttributeID>
          <cat:DescriptionID>soft</cat:DescriptionID>
        </cat:PhysicalAttribute>
        <cat:PhysicalAttribute>
          <cat:AttributeID>finish</cat:AttributeID>
          <cat:DescriptionID>primed</cat:DescriptionID>
        </cat:PhysicalAttribute>
        <cat:PhysicalAttribute>
          <cat:AttributeID>fittings</cat:AttributeID>
          <cat:DescriptionID>satin</cat:DescriptionID>
        </cat:PhysicalAttribute>
        <cat:PhysicalAttribute>
          <cat:AttributeID>glazing</cat:AttributeID>
          <cat:DescriptionID>single</cat:DescriptionID>
        </cat:PhysicalAttribute>
      </cat:SellersItemIdentification>
      <cat:BasePrice>
        <cat:PriceAmount currencyID="GBP">0.00</cat:PriceAmount>
      </cat:BasePrice>
    </cat:Item>
  </cat:OrderLine>

As this is a base schema, however, there's little choice. A base schema must show the most abstract, common information among its derived schemas. It was a goal of XSD to enable the construction of derived schemas to be valid against both derived and base types. As an example of such derivation we have the fragment in fig. 3. Here we have a JoineryLine element - a restriction of OrderLine containing just those elements of OrderLine found in the Joinery.xml document. The element has another difference, though. Rather than have an Item child, it has a Cabinet - a very precise kind of Item one might find in a catalog, rather than a general one, like OrderLine.

Figure 3
<Cabinet>
    <cat:BuyersID>A</cat:BuyersID>
    <cat:SellersID/>
    <cat:Quantity unitCode="unit">2</cat:Quantity>
    <cat:Item>
      <cat:ID></cat:ID>
      <JoineryItemID>
        <cat:ID>236WV</cat:ID>
        <Pine><WoodID/><SoftID/></Pine>
        <Primed><FinishID/><PrimedID/></Primed>
        <SatinFittings><FittingsID/><SatinID/></SatinFittings>
        <SingleGlaze><GlazingID/><SingleID/></SingleGlaze>
      </JoineryItemID>
      <cat:BasePrice>
        <cat:PriceAmount currencyID="GBP">0.00</cat:PriceAmount>
      </cat:BasePrice>
    </cat:Item>
  </Cabinet>

While this document might be better suited for a purchase order in the specific domain, it poses a problem for application development. By moving domain semantics into the structure, or elements, these names are very different. An XSLT1.0 stylesheet designed for the more general case would need to be largely rewritten. However the element() predicate allows a stylesheet written for the base element types to be directly applied to the new document with equivalent results.

The element() predicate takes two arguments. Each can be either a qualified name or a wildcard - "*". The first argument represents an XSD global element and will match any element in the substitution group of which the argument is the head. The second argument represents an XSD type - simple or complex - and will match any element whose type is derived, by zero or more steps, from the argument. In both cases the wildcard matches all elements and given two arguments, an element must match both.

To see how this works, figure 4 contains a part of the Joinery schema describing the document in fig 3. Each type in this schema is derived from a type in the base schema and each element is in the substitution group of an element in the base. For example, JoineryLineType restricts OrderLineType to those elements which will be actually used. The Cabinet is in the substitution group of OrderLine.

Figure 4
<xsd:element name="JoineryLine" substitutionGroup="cat:OrderLine" type="JoineryLineType"/>
    <xsd:complexType name="JoineryLineType" id="UBL000377">
        <xsd:complexContent>
            <xsd:restriction base="cat:OrderLineType">
                <xsd:sequence>
                    <xsd:element ref="cat:BuyersID" id="UBL000378"/>
                    <xsd:element ref="cat:SellersID" id="UBL000379" minOccurs="0"/>
                    <xsd:element ref="cat:Quantity" id="UBL000381" minOccurs="0"/>
                    <xsd:element ref="cat:Item" id="UBL000388"/>
                    <xsd:element ref="cat:DeliveryRequirement" id="UBL000389"/>
                </xsd:sequence>
            </xsd:restriction>
        </xsd:complexContent>
    </xsd:complexType>
    <xsd:element name="Cabinet" substitutionGroup="cat:JoineryLine" type="JoineryLineType"/>
    <xsd:element name="JoineryItem" substitutionGroup="cat:Item" type="JoineryItemType"></xsd:element>
    <xsd:complexType name="JoineryItemType" id="UBL000285">
        <xsd:complexContent>
            <xsd:restriction base="cat:ItemType">
                <xsd:sequence>
                    <xsd:element ref="cat:ID" id="UBL000286"/>
                    <xsd:element ref="cat:Description" id="UBL000287" minOccurs="0"/>
                    <xsd:element ref="cat:BuyersItemIdentification" id="UBL000291" minOccurs="0"/>
                    <xsd:element ref="cat:SellersItemIdentification" id="UBL000292" minOccurs="0"/>
                    <xsd:element ref="cat:Tax" id="UBL000301" minOccurs="0" maxOccurs="unbounded"/>
                    <xsd:element ref="cat:BasePrice" id="UBL000302" maxOccurs="unbounded"/>
                </xsd:sequence>
            </xsd:restriction>
        </xsd:complexContent>
    </xsd:complexType>
    <xsd:element name="JoineryAttribute" type="JoineryAttributeType" 
    substitutionGroup="cat:PhysicalAttribute"/>
    <xsd:complexType name="JoineryAttributeType" id="UBL000447">
        <xsd:complexContent>
            <xsd:restriction base="cat:PhysicalAttributeType">
                <xsd:sequence>
                    <xsd:element ref="cat:AttributeID" id="UBL000448"/>
                    <xsd:element ref="cat:DescriptionID" id="UBL000450" minOccurs="0"/>
                    <xsd:element ref="cat:Description" id="UBL000451" minOccurs="0"/>
                </xsd:sequence>
            </xsd:restriction>
        </xsd:complexContent>
    </xsd:complexType>
    <xsd:element name="JoineryItemID" substitutionGroup="cat:SellersItemIdentification" 
    type="JoineryItemIDType"></xsd:element>
    <xsd:complexType name="JoineryItemIDType">
        <xsd:complexContent>
            <xsd:restriction base="cat:SellersItemIdentificationType">
                <xsd:sequence>
                    <xsd:element ref="cat:ID" id="UBL000559"/>
                    <xsd:element ref="cat:Extension" id="UBL000560" minOccurs="0"/>
                    <xsd:element ref="JoineryAttribute" minOccurs="0" maxOccurs="unbounded"/>
                    <xsd:element ref="cat:ItemMeasurement" id="UBL000562" minOccurs="0" maxOccurs="unbounded"/>
                </xsd:sequence>
            </xsd:restriction>
        </xsd:complexContent>
    </xsd:complexType>
    <xsd:element name="WoodID" substitutionGroup="cat:AttributeID" fixed="wood"/>
    <xsd:element name="SoftID" substitutionGroup="cat:DescriptionID" fixed="soft"/>
    <xsd:element name="WoodAbstract" substitutionGroup="JoineryAttribute"/>
    <xsd:element name="Pine" substitutionGroup="WoodAbstract">
        <xsd:complexType>
            <xsd:complexContent>
                <xsd:restriction base="JoineryAttributeType">
                    <xsd:sequence>
                        <xsd:element ref="WoodID"/>
                        <xsd:element ref="SoftID"/>
                    </xsd:sequence>
                </xsd:restriction>
            </xsd:complexContent>
        </xsd:complexType>
    </xsd:element>
     </xsd:element>

In this manner we can derive domain specific types to lock down much of the variable information of the base schema, thereby ensuring tighter validation and better support for form and code generation utilities while still ensuring the appearance of information for applications targeting higher levels in the derivation hierarchy.

Cabinet, for example, is constrained to only have a JoineryItemType which in turn requires a JoineryItem, etc., down to the Pine element, which finally contains WoodID and SoftID (the author abjures any knowledge of realistic wood properties). At every stage elements and types are in a restriction relation - where actual content is allowed (inside AttributeID and DescriptionID) it is fixed and need appear in a document.

There is a significant weakness in XSD's restriction mechanism. We cannot restrict the sequence of JoineryAttributes to exactly those that can appear in a Cabinet, but we can only change the minOccurs, maxOccurs, and ref attribute values. This is not as "tight" as we'd like, as it allows other element from the schema to appear here beyond only those allowed in a Cabinet.

A glance at a fragment from Holman's original XSLT1.0 stylesheet library in figure 5 shows it is loaded with element and attribute names from the schema. The use of these names, a necessity in XSLT1.0, fundamentally prevents this stylesheet from being used on the document from the derived schema as it no longer contains many of those names. For example, the <apply-templates> elements for processing columns each finds its own source by looking for a PhysicalAttribute containing an AttributeID whose content is the desired value, such as "cat:PhysicalAttribute[cat:AttributeID='wood']/cat:DescriptionID". These element no longer appear in the derived document, preventing reuse. This is not an issue of the stylesheet's design - it is a reflection of the language in which it is written.

Figure 5
<xsl:template match="cat:OrderLine">
...
        <table-cell border="solid 1pt" number-rows-spanned="{$rows}">
          <block>
            <xsl:apply-templates select="cat:PhysicalAttribute[cat:AttributeID='wood']/cat:DescriptionID"/>
          </block>
        </table-cell>
        <table-cell border="solid 1pt" number-rows-spanned="{$rows}">
          <block>
            <xsl:apply-templates select="cat:PhysicalAttribute[cat:AttributeID='finish']/cat:DescriptionID"/>
          </block>
        </table-cell>
        <table-cell border="solid 1pt" number-rows-spanned="{$rows}">
          <block>
            <xsl:apply-templates select="cat:PhysicalAttribute[cat:AttributeID='fittings']/cat:DescriptionID"/>
          </block>
        </table-cell>
        <table-cell border="solid 1pt" number-rows-spanned="{$rows}">
          <block>
            <xsl:apply-templates select="cat:PhysicalAttribute[cat:AttributeID='glazing']/cat:DescriptionID"/>
          </block>
        </table-cell>
      </xsl:for-each>
...
</xsl:template>

However everything in the derived schema is linked by XSD mechanisms back to the UBL schema. All of that information is available, just not accessible in XSLT1. The addition of element() to XPath2 finally opens up that information to the stylesheet. We systematically substituted calls to element() for element types throughout the stylesheet - every instance of an element foo:bar in an XPath became element(foo:bar,*). Because every element is in its own substitution group, this works on the original document without alternation to produce the same output as before. More importantly for our current effort this also works directly on the derived document, producing the same output. Given the disparities between the two documents and the heavy use of XSD mechanisms as the link, this is a strong testimony to the importance of these mechanisms.

Figure 6
<xsl:template match="*[element(ca-0.7:t:OrderLine,*)]">
...
      <xsl:for-each select="*[element(cat-0.7Item,*)]/*[element(cat-0.7:SellersItemIdentification,*)]">
        <table-cell border="solid 1pt" number-rows-spanned="{$rows}">
          <block>
            <xsl:apply-templates select="*[element(cat-0.7:PhysicalAttribute,*)]
                           [*[element(cat-0.7:AttributeID,*)]='wood']/
                           *[element(cat-0.7:DescriptionID,*)]"/>
          </block>
        </table-cell>
        <table-cell border="solid 1pt" number-rows-spanned="{$rows}">
          <block>
            <xsl:apply-templates select="*[element(cat-0.7:PhysicalAttribute,*)]
                           [*[element(cat-0.7:AttributeID,*)]='finish']/
                           *[element(cat-0.7:DescriptionID,*)]"/>
          </block>
        </table-cell>
        <table-cell border="solid 1pt" number-rows-spanned="{$rows}">
          <block>
            <xsl:apply-templates select="*[element(cat-0.7:PhysicalAttribute,*)]
                           [*[element(cat-0.7:AttributeID,*)]='fittings']/
                           *[element(cat-0.7:DescriptionID,*)]"/>
          </block>
        </table-cell>
        <table-cell border="solid 1pt" number-rows-spanned="{$rows}">
          <block>
            <xsl:apply-templates select="*[element(cat-0.7:PhysicalAttribute,*)]
                           [cat-0.7:AttributeID='glazing']/
                           *[element(cat-0.7:DescriptionID,*)]"/>
          </block>
        </table-cell>
      </xsl:for-each>
...
</xsl:template>

An important feature needed to complete this picture is the use of priorities. They've not been necessary to this scenario up to this point because

  • The base schema doesn't use substitution groups, so the semantics are "flat".
  • We want the derived schema to have the exact same behavior as the base schema.

If we change either of these features, then priorities become important. Derivation is commonly used because there's a desire to add more specific semantics to meaning of the derived element. Both OrderItem and JoineryItem (and any members of their substitution groups) match element(OrderItem,*) and therefore are processed by the same template. However, if we wish for JoineryItem to be processed differently, then we need to use priorities - a template with element(JoineryItem,*) will match the element, but so will the template for the base type. The only way to distinguish them is to systematically make the priority for more specific templates higher than for less specific ones. In this case, the priority for the default template (from the base stylesheet would be 1 and the priority for a JoineryItem specific template would be 2. Note that a JoineryItem template with additional functionality, such as displaying a picture of the item, would be used if available, otherwise processing would default back to the current stylesheet. This is further described in [POLY].

Supporting Incompatible Versions

The element() predicate is extremely useful for supporting extensibility when structure in the new schema is derived from structures in the old. Extensibility to use the same stylesheet to handle similar schemas unrelated by XSDs automated mechanisms cannot be done without adding to the schema. For example, all of Holman's schemas were designed for UBL 0.7. The UBL 1.0 beta release is in a completely new namespace - from the perspective of XSD, they are totally unrelated. The use of a new namespace for 1.0-beta means none of the XPaths apply. In addition, the locations of some constructs have moved. However, using the traditional software engineering practice of abstraction, we can still enable a single stylesheet to support multiple, apparently incompatible, schemas.

The problem is apparent when looking at any stylesheet, although we will continue with our current example. The stylesheet is riddled with XPath expressions walking over a very specific structure in a very specific schema. This strongly ties the stylesheet to the structure of a particular schema and makes it difficult to apply to another schema. This is scarcely the fault of programmers using XSLT1.0, though. The semantics of the language force this kind of construction because it is impossible, in XSLT1.0, to use any other means than an immediately inlined XPath to locate nodes of the input. If the XPath expression is abstracted out, such as being accessed from a named template, the result cannot be a tree of nodes from the input. The key to greater abstraction in XSLT2.0 is the ability to return source nodes from templates and other constructions. (Some of this, of course, can be offset by using axes, such as ancestor or descendant, that are more loosely tied to document structure.)

Creating the right level of abstraction requires separating algorithms and XPaths. In the first case, if a template body performs an important part of the computation - such as laying out important parts of the document, then that algorithm will need to function on all schemas to which the stylesheet is applied. Therefore a match statement would need to match a set of elements unknown when it is written. Rather than being called directly, such templates are better called by name whenever an appropriate element is found from the input. In the second case, XPath expressions tied to particular document structures can be removed from the guts of templates and placed in a separate set of accessor templates, where the accessors vary by schema. By abstracting out these latter XPaths, they can be organized into separate modules imported when a schema of the appropriate type is encountered, without changing the core of the algorithm.

An example of the first pattern can be found in the routine to process each OrderLine element. This has a match value specific for the 0.7 schema, as shown in figure 6. To have it support other versions of UBL, such as 1.0, would require turning the match statement into a never closed or-statement. However, we can abstract away from this by rewriting it as in figure 7, where the template is named "doOrderLine" and there are two new templates preceding it, one for each UBL version, calling "doOrderLine" for elements of the appropriate type. There are a number of significant templates in the complete stylesheet that can be altered in this way, such as the main routine. The "dispatch" templates for a single schema would normally be kept in separate files and imported here, rather than being all mixed in.

Figure 7
                    <xsl:template match="*[element(cat-1.0beta:OrderLine,*)]">
                        <xsl:call-template name="doOrderLine"/>
                    </xsl:template>
                    
                    <xsl:template match="*[element(cat-0.7:OrderLine,*)]">
                        <xsl:call-template name="doOrderLine"/>
                    </xsl:template>
                    
                    <xsl:template name="doOrderLine"> 
                        ... 
                        <xsl:variable name="sellersItemIdent" as="element()*">
                              <xsl:apply-templates select="." mode="getSellersItemIdent"/>
                        </xsl:variable>
                        <xsl:for-each select="$sellersItemIdent">
                            <table-cell border="solid 1pt" number-rows-spanned="{$rows}">
                                <block>
                                    <xsl:variable name="wood" as="element()*">
                                        <xsl:apply-templates select="." mode="getItemProperty">
                                            <xsl:with-param name="property">wood</xsl:with-param>
                                        </xsl:apply-templates>
                                    </xsl:variable>
                                    <xsl:apply-templates select="$wood"/>
                                </block>
                            </table-cell>
                            <table-cell border="solid 1pt" number-rows-spanned="{$rows}">
                                <block>
                                    <xsl:variable name="finish" as="element()*">
                                        <xsl:apply-templates select="." mode="getItemProperty">
                                            <xsl:with-param name="property">finish</xsl:with-param>
                                        </xsl:apply-templates>
                                    </xsl:variable>
                                    <xsl:apply-templates select="$finish"/>
                                </block>
                            </table-cell>
                            <table-cell border="solid 1pt" number-rows-spanned="{$rows}">
                                <block>
                                    <xsl:variable name="fittings" as="element()*">
                                        <xsl:apply-templates select="." mode="getItemProperty">
                                            <xsl:with-param name="property">fittings</xsl:with-param>
                                        </xsl:apply-templates>
                                    </xsl:variable>
                                    <xsl:apply-templates select="$fittings"/>
                                </block>
                            </table-cell>
                            <table-cell border="solid 1pt" number-rows-spanned="{$rows}">
                                <block>
                                    <xsl:variable name="glazing" as="element()*">
                                        <xsl:apply-templates select="." mode="getItemProperty">
                                            <xsl:with-param name="property">glazing</xsl:with-param>
                                        </xsl:apply-templates>
                                    </xsl:variable>
                                    <xsl:apply-templates select="$glazing"/>
                                </block>
                            </table-cell>
                        </xsl:for-each>
                        ...
                    </xsl:template>

There are also numerous examples of long, complex XPath expressions inside templates. The second pattern is to filter these out. The body of "doOrderLine", above, contains no XPaths, although the original in figure 5 and the polymorphic version in figure 6, do. Instead, the template calls special "accessor" templates, shown in figure 8. Each of these functions returns nodes from the source tree for further processing. They could not have been written in XSLT1.0, although a similar, but less powerful, style is possible there. XSLT1.0 allows an author to "chain" templates together, so that a named template can then call another that locates some nodes, which then further calls another template to process them. This provides some of the functionality, but is clearly more awkward, with the two types of templates completely entangled. Nevertheless, this pattern has been used, demonstrating the need for it.

The first template is a straightforward accessor - it returns the SellersItemIdentification element of an OrderLine element. In the doOrderLine template this is assigned to the variable sellersItemIdent. The second template is a generalized property accessor, parameterized by the property name.

Figure 8
<xsl:template match="*[matt:element(cat-1.0beta:OrderLine,*)]" mode="getSellersItemIdent">
                          <xsl:sequence  
                          select="*[matt:element(cat-1.0beta:LineItem,*)]/
                          *[matt:element(cat-1.0beta:Item,*)]/
                          *[matt:element(cat-1.0beta:SellersItemIdentification,*)]"/>
                </xsl:template>
                <xsl:template
                        match="*[matt:element(cat-1.0beta:SellersItemIdentification,*)]" mode="getItemProperty">
                        <xsl:param name="property"/>
                        <xsl:sequence select="*[matt:element(cat-1.0beta:PhysicalAttribute,*)]
                                                                [*[matt:element(cat-1.0beta:AttributeID,*)]=$property]/
                                                                *[matt:element(cat-1.0beta:Description,*)]"/>
                </xsl:template>

An important consideration in using these techniques is the trade-off between cost and genericity. Clearly some work needs to be done for every new schema to be supported by a generic stylesheet. At the very least, the different XPath accessor templates may need to be written. The number of these will vary from stylesheet to stylesheet, and how many of them will be reusable in different stylesheets for a similar problem domain remains to be seen. Therefore it will sometimes be the case that simply rewriting some parts of a stylesheet for a new schema is as easy or easier than maintaining its genericity. As is often the case, the extra effort won't pay off until at least the second version.

Conclusion

As a side effect, features in XSLT2.0 vastly improve the ability to create extensible and reusable stylesheets. The element() predicate, in particular, finally makes it possible for XSLT to exploit the OO features approved for XSD in 2001. We anticipate these features being used extensively by stylesheet developers interested in these properties. We also see the XSLT1.0 pattern of chaining templates is strengthened by the ability to return tree nodes and sequences from a function.

There are a few recommendations that present themselves from the experience of using these features in their current form.

  • While the element() predicate is essential for using XSD inheritance, it is unwieldy as a predicate. This is even more true when there are multiple templates matching an element or complex type at different levels in a derivation/substitution group chain. Right now this can be handled by using priorities - templates matching more derived elements or types should have higher priorities (more specific, in OO terminology). However, this is tedious, error-prone, and can only apply to the last axis in an XPath. A far more powerful technique would be to make element and type into XPath axes refining the self axis, with priorities sorted by specificity (more recent ancestors in the hierarchy have priority over more distant ones). This would have the same effect as priorities for the last axis in the path, but would also "do the right thing" for any intermediate axes.
  • XSD restriction is too strict to allow many interesting content models, so restrictions cannot be as close to the ideal content model of the restricted type as one would like.
  • Function composition, where the results of one function are passed directly to another function, is difficult in XSLT. The method used here is to call the first function and assign it to a (temporary) variable. A more functional syntax would allow the contents of <apply-templates> element would itself be a function application, such as another <apply-templates>, rather than requiring the input to be specified by a select.
    This particularly helps for the "horizontal" reuse, where the different versions are not linked in a derivation or substitution group chain.


Acknowledgments

The author would particularly like to thank Ken Holman for bravely consenting to my using his stylesheets for this effort.


Bibliography

[HOLMAN] G. Ken Holman, Using Crane's Universal Business Language (UBL) Stylesheet Library, 2003, http://www.cranesoftwrights.com/resources/ublss/

[POLY] Matthew Fuchs, "Making W3C XML Schema's Object Oriented Features in SAX/XSL/DOM ", Proceedings, XML Conference and Exposition, Philadelphia, 2003, Idealliance, http://www.idealliance.org/proceedings/xml03/

[UBL] The UBL effort at Oasis is hosted at http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl

[UBL07] OASIS UBL Library Content SC, Universal Business Language — Library Content — 0p70 Public Review, Oasis Open, 2003, http://oasis-open.org/committees/ubl/lcsc/0p70/

[UBL1B] OASIS UBL Library Content SC, Universal Business Language 1.0 Beta – Committee Draft , http://www.oasis-open.org/committees/ubl/lcsc/UBLv1-beta/

[XPATH] Anders Berglund, et alia, editors, XML Path Language (XPath) 2.0, W3C Working Draft, 2003, http://www.w3.org/TR/xpath20/

[XREQS] Steve Muench and Mark Scardina, eds., XSLT Requirements, Version 2.0, W3C, 2001, http://www.w3.org/TR/xslt20req

[XSD] Henry Thompson et alia, XML Schema Part 1: Structures, W3C, 2001, http://www.w3.org/TR/xmlschema-1/

[XSLT1] James Clark, editor, XSL Transformations (XSLT) Version 1.0, W3C, 1999 http://www.w3.org/TR/xslt

[XSLT2] Michael Kay, ed., XSL Transformations (XSLT) Version 2.0, W3C Working Draft, 2003, http://www.w3.org/TR/xslt20/



Achieving Extensibility and Reuse for XSLT2.0 Stylesheets

Matthew Fuchs [Senior Architect, Westbridge Technology]