XTL: An XML Transformation Language and XSLT generator for XTL

Makoto Onizuka
oni@acm.org

Abstract

This paper describes a new query and transformation language XTL (An XML transformation language). XTL is based on both output driven and schema driven approach: 1) To specify an output structure of transformation using XML schema language (so far we choose DTD), 2) To map from well-formed input XML documents to an output structure using XPath that is embedded in DTD.

XTL has a simple syntax, as it is declarative and it has few extensions. Users only have to understand DTD and XPath specifications with few extensions and rules of XTL. XTL is powerful because it has efficient operations for extraction and transformation for XML data.

This paper also describes XTL processor to translate XTL expression to XSLT expression. This generator is useful for XSLT users who would like to transform XML, because XTL is much simpler than XSLT. They can use this generator as a front-end tool of XSLT.

Keywords: Querying; Transforming; XPath; XSLT

Makoto Onizuka

Makoto Onizuka received his B.S. degree in computer science from the Tokyo Institute of Technology in 1991. He joined NTT Information Systems Laboratories in the same year, where he was involved in research on relational database design, visual data analysis tool, and object relational database management systems for multimedia retrievals.

He is currently involved in research on XML data management systems in NTT Cyber Space Laboratories and in the University of Washington as a visiting scholar from 2000.

He is a member of ACM.

XTL: An XML Transformation Language and XSLT generator for XTL

Makoto Onizuka [Research Engineer; NTT CyberSpace Laboratories]

Extreme Markup Languages 2001® (Montréal, Québec)

Copyright © 2001 Makoto Onizuka. Reproduced with permission.

What is XTL?

As data exchange (B2B and B2C) is getting focused, not only a standardization of XML subset like SMIL, MML, B-XML and so forth, but also transformation technology is getting important to exchange XML data in difference structure. XTL is a transformation language to output an XML data by querying and transforming from a collection of input XML data.

The related existing languages are:

XML query language (XQL[XQL FAQ], XML-QL[A query language for XML], Quilt[Quilt: an XML Query Language])

XSL Transformations (XSLT [XSL Transformations (XSLT) Version 1.0][XSL Transformations (XSLT) Version 1.1])

However, these languages have several issues.

It is not easy to learn and to use because of their syntax complexity (XSLT, Quilt for example).

Their functions are not efficient (XQL lacks restructure operation, XML-QL lacks structure preserving query, for example).

XTL solves these issues. XTL's main functions are as follows.

To specify an output structure of transformation using XML schema language. So far, we choose DTD as the XML schema language because it is most popular in XML schema languages. XML Schema and RELAX will be good candidate of XML schema language for XTL.

To specify mapping rule between input well-formed XML data and an output structure using XPath[XML Path Language (XPath) Version 1.0].

Therefore, XTL is easy to learn for people who have knowledge with DTD and XPath.

In addition, XTL's fundamental operations are based on relational query model as follows.

Table 1: XTL's fundamental operations use a relational query model
Operation Relational query model XTL expression
Projection SELECT clause Projected elements or attributes are specified using DTD
Selection WHERE clause XPath's selection is used
Rename tag FROM clause Renamed element or attribute is specified using DTD
Set operation (union, difference, intersect) UNION clause, - clause Using XTL extension +, -, *, / operations for node-set specified by XPath. XPath 2.0 [XPath Language Requirement Version 2.0] will support these operations.
Cartesian product (join) FROM clause (and WHERE clause, table1.column1 = table2.column2) Using XSLT function document()
Sort ORDER BY clause Using XTL extension ORDER BY clause.
Eliminate duplicated node DISTINCT or GROUP BY clause Using XTL extension GROUP BY clause

XTL includes several important functions of XML query as follows. We summarize the below table based on paper [XML Query Languages: Experiences and Exemplars] and adds several other important functions.

Table 2: XTL includes XML query functions
Function Description XTL expression
Structure preserving A query to preserve a structure of input XML data By specifying DTD and XPath to preserve a structure of input XML data
Changing structure (including flattening) A query to change a structure of input XML data XTL can change a structure of input XML data by using DTD and XPath.
Tag variable Keeping a same tag name with input XML Use $variable as tag in DTD
External function Invocating a user-defined function XPath selection condition can use a user-defined function
Specifying all of sub-structures To extract all sub-structures of a specified tag Use ANY in DTD
Recursive query A query that executed on a recursive structure To define a recursive structure in DTD or recursive selection in XPath
Reference (data models and navigations) Referring to a referenced tag Referring to a referenced tag using XPath

Examples of XTL Expressions

This section describes several examples to explain the XTL fundamental operations and functions.

Projection while preserving structure

The first is an example to project several tags of XML data while preserving structure. The input XML data is showed below as bib.xml (XML_QL examples).

Figure 1: Sample document: bibliography (bib.xml)
<?xml version="1.0" ?> 
<bib> 
  <book year="1995"> 
     <!-- A good introductory text --> 
     <title>An Introduction to Database Systems</title> 
     <author><lastname>Date</lastname></author> 
     <publisher><name>Addison-Wesley</name></publisher> 
  </book> 
  <book year="1998"> 
     <title>Foundations for Object/Relational Databases</title> 
     <author><lastname>Date</lastname></author> 
     <author><lastname>Darwen</lastname></author> 
     <publisher><name>Addison-Wesley</name></publisher> 
  </book> 
  <book year="1999"> 
     <title>Data on the Web: from Relations to Semistructured Data & XML</title> 
     <author><firstname>Serge</firstname><lastname>Abiteboul</lastname></author> 
     <author><firstname>Peter</firstname><lastname>Buneman</lastname></author> 
     <author><firstname>Dan</firstname><lastname>Suciu</lastname></author> 
     <publisher><name>Morgan-Kaufman</name></publisher> 
  </book> 
  <article year="1999" type="inproceedings" month="June"> 
     <author><firstname>Mary</firstname><lastname>Fernandez</lastname></author> 
     <author><firstname>Alin</firstname><lastname>Deutsch</lastname></author> 
     <author><firstname>Dan</firstname><lastname>Suciu</lastname></author> 
     <title>Storing Semi-structured Data Using STORED</title> 
     <booktitle>ACM SIGMOD</booktitle> 
  </article> 
  <article year="1995" type="inproceedings" month="Jan"> 
     <author><firstname>Norman</firstname><lastname>Ramsey</lastname></author> 
     <author><firstname>Mary</firstname><lastname>Fernandez</lastname></author> 
     <title>The New Jersey Machine-Code Toolkit</title> 
     <booktitle>USENIX</booktitle> 
  </article> 
</bib>

Let's suppose a query (or transformation) to make bibliography that contains only books and eliminates all articles. The next XTL expresses this query.

Figure 2: Match “book” element
<!ELEMENT bib AS {bib} (book*)> 
<!ELEMENT book (title, author+)> 
<!ATTLIST book year CDATA #REQUIRED> 
<!ELEMENT title  (#PCDATA)> 
<!ELEMENT author (firstname?, lastname)> 
<!ELEMENT firstname  (#PCDATA)> 
<!ELEMENT lastname (#PCDATA)>

This XTL expression specifies required elements/attributes, bib, book and all its sub-structures in DTD. Therefore, it produces a bibliography that includes only books. To make an implement of the XTL processor easy, XTL follows a rule that any element that can be a root tag must declared with XPath expression (AS {XPath expression}) in its element type declaration. The XPath expression specifies which part of input XML data is mapped to a tag in the output structure. In this example, the bib tag in the input data is mapped to the bib tag in the output structure. When an XPath expression is omitted for an element in content model or an attribute in attribute list declaration, XTL follows a rule that the specified output XML tag is used as default XPath expression. For example, <!ELEMENT bib AS {bib} (book*)> has the same meaning with <!ELEMENT bib AS {bib} (book* AS {book})>.

However, it is a burden for users to specify all sub-structures of book because they don't transform it at all. XTL query using ANY in DTD solves this issue.

ANY

This example with ANY produces the same query result with a result of the previous XTL expression.

Figure 3: Same result using ANY
<!ELEMENT bib AS {bib} (book*)> 
<!ELEMENT book ANY> 
<!ATTLIST book year CDATA #REQUIRED>

ANY plays the same role of specifying the all sub-structures (title, author, firstname, and lastname) of book in this XTL expression. Basically, ANY matches all sub-structures recursively if there is no other element type declaration that matches the sub-structures.

ANY is powerful to change a small part of an input XML data. Next example transforms author name by concatenating firstname and lastname and makes new author_name element while keeping the same with other part of input XML.

Figure 4: Power of ANY
<!ELEMENT bib AS {bib} ANY> 
<!ELEMENT author AS {author} (author_name)> 
<!ELEMENT author_name (#PCDATA AS {concat(firstname, ' ', lastname)}>

In addition, this example contains a different use of AS clause. If AS clause is specified for #PCDATA or attribute data, it maps a value to an output node. If AS clause is specified for element, it maps an input node to an output node.

Selection

This example is to extract tags satisfying several conditions on elements or attributes. For example, let us make a bibliography that contains only books which is published after 1995 and whose title contains XML. The next XTL expresses this query.

Figure 5: Match attribute values and strings
<!ELEMENT bib AS {bib} (book*)> 
<!ELEMENT book (title AS {title[contains(.,'XML')]}, author+)> 
<!ATTLIST  book year AS {@year[.>1995]} CDATA #REQUIRED> 
<!ELEMENT title (#PCDATA)> 
<!ELEMENT author ANY>

XPath expression @year[.>1995] indicates that book's year should be larger than 1995, and the expression title[contains(.,'XML')] indicates that all book's title should contain a string XML. Therefore, this XTL expression extracts such book whose year is larger than 1995 and whose title contains XML.

Rename

Suppose an transformation example to capitalize all tag names.

Figure 6: Manipulate tag names
<!ELEMENT Bib     AS {bib}  (book | article)*> 
<!ELEMENT Book    AS {book} (title, author+, publisher)> 
<!ATTLIST  Book 
             Year AS {year}    CDATA #REQUIRED> 
<!ELEMENT Article AS {article} 
                           (author+, title, booktitle?, 
                            (shortversion | longversion)?)> 
<!ATTLIST  Article 
             Type   AS {type}  CDATA #REQUIRED 
             Year   AS {year}  CDATA #REQUIRED 
             Month  AS {month} CDATA #IMPLIED> 
<!ELEMENT Publisher AS {publisher} (name, address?)> 
<!ELEMENT Name      AS {name}      (#PCDATA)> 
<!ELEMENT Title     AS {title}     (#PCDATA)> 
<!ELEMENT Author    AS {author}    (firstname?, lastname)> 
<!ELEMENT Firstname AS {firstname} (#PCDATA)> 
<!ELEMENT Lastname  AS {lastname}  (#PCDATA)> 
<!ELEMENT Booktitle AS {booktitle} (#PCDATA)>

The first line in this XTL expression indicates that bib tag should be transformed to Bib tag for example.

Changing structure

Next is a changing structure example is to build a list of authors.

Figure 7: Group to change structure (1)
<!ELEMENT bib AS {bib} (author* AS {//author} GROUP BY {.})> 
<!ELEMENT author ANY>

This example specifies a collection of author element as child elements of bib element and that is extracted using XPath expression //author. Moreover, the GROUP BY clause eliminates duplicated author elements. The exclusion key that is specified with GROUP BY is {.} and indicates author element itself. This elimination is based on deep equality of author structure.

Let us look at more complicated example that outputs a list of author that contains a list of book title for each author. This example transforms the input structure between title and author upside down.

Figure 8: Group to change structure (2)
<!ELEMENT bib AS {bib} (author* AS {//author} GROUP BY {.})> 
<!ELEMENT author (firstname?, lastname, title* AS {//title[../author=$author]}> 
<!ELEMENT title (#PCDATA)> 
<!ATTLIST  title year AS {../@year} CDATA #REQUIRED> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT lastname  (#PCDATA)>

As the same with the previous example, this XTL specifies a collection of author as child elements of bib tag. In addition, it also specifies a collection of title tag as child elements of author tag to make a title list for each author. The XPath expression //title[../author=$author] collects title elements from input XML data whose parent tag has a author element and that is the same with the current author (specified by $author).

Sort

Sort is an operation to reorder a collection of element in ascending or descending order using specified key tag value. Let us look at a simple example that collects an author list and sort it by its name in alphabetical order.

Figure 9: Sort (1)
<!ELEMENT bib AS {bib} (author* IN {//author} GROUP BY {.} ORDER BY {.})> 
<!ELEMENT author ANY>

In addition to collect an author list that is the same operation with the first example of changing structure, this example adds an ORDER BY clause to sort the author list. When users omit an ascending or descending clause, ascending is chosen as a default action. The next example describes a descending order example.

Figure 10: Sort (2)
<!ELEMENT bib AS {bib} 
      (author* IN {//author} GROUP BY {.} ORDER BY {.} DESC)> 
<!ELEMENT author ANY>

Let us look at an example that has two sort operations in different part. The next example sorts both the author list and its title list.

Figure 11: Sort (3)
<!ELEMENT bib AS {bib} 
           (author* IN {//author} GROUP BY {.} ORDER BY {lastname})> 
<!ELEMENT author AS {.} (firstname?, lastname, title* ORDER BY {.})> 
<!ELEMENT title  AS {//title [../author=$author]} (#PCDATA)> 
<!ATTLIST  title year AS {../@year} CDATA #REQUIRED> 
<!ELEMENT firstname (#PCDATA)> 
<!ELEMENT lastname  (#PCDATA)>

The first ORDER BY clause indicates that the author list should be sorted using lastname as key value. The second ORDER BY clause indicates the title list for each author should be sorted using title itself (expressed by .) as key value.

Join

Join operation is a core operation in relational model because relation is a unit and all information is divided into a collection of relations. On the other hand, join operation in XTL is just to combine several XML data into one XML data. Let us look into a join operation example to combine two XML data. One of input XML data is book catalog (BookCatalogue.xml) that does not contain book price data and the other is bookstore list (BookCosts.xml) including book price for each bookstore.

Figure 12: Sample document: book catalogue (BookCatalogue.xml)
<?xml version="1.0"?> 
<BookCatalogue> 
  <Book> 
    <Title>My Life and Times</Title> 
    <Author>Paul McCartney</Author> 
    <Date>July, 1998</Date> 
    <ISBN>94303-12021-43892</ISBN> 
    <Publisher>McMillin Publishing</Publisher> 
  </Book> 
  <Book> 
    <Title>Illusions The Adventures of a Reluctant Messiah</Title> 
    <Author>Richard Bach</Author> 
    <Date>1977</Date> 
    <ISBN>0-440-34319-4</ISBN> 
    <Publisher>Dell Publishing Co.</Publisher> 
  </Book> 
  <Book> 
    <Title>The First and Last Freedom</Title> 
    <Author>J. Krishnamurti</Author> 
    <Date>1954</Date> 
    <ISBN>0-06-064831-7</ISBN> 
    <Publisher>Harper & Row</Publisher> 
  </Book> 
</BookCatalogue>
Figure 13: Sample document: bookstore list with prices (BookCosts.xml)
<?xml version="1.0"?> 
<BookCosts> 
  <Book> 
    <Title>My Life and Times</Title> 
    <Cost store="Walden Books">$12.95</Cost> 
    <Cost store="Barnes & Noble">$10.95</Cost> 
  </Book> 
  <Book> 
    <Title>Illusions The Adventures of a Reluctant Messiah</Title> 
    <Cost store="Walden Books">$5.95</Cost> 
    <Cost store="Barnes & Noble">$6.95</Cost> 
  </Book> 
  <Book> 
    <Title>The First and Last Freedom</Title> 
    <Cost store="Walden Books">$9.95</Cost> 
    <Cost store="Barnes & Noble">$8.95</Cost> 
  </Book> 
</BookCosts>

The below XTL expression produces an XML data that combines the above two XML data.

Figure 14: Join
<!ELEMENT Bib AS {BookCatalogue} (Book*)> 
<!ELEMENT Book   (Title, Author+, Date, ISBN, Publisher, Cost*)> 
<!ELEMENT Title  (#PCDATA)> 
<!ELEMENT Author (#PCDATA)> 
<!ELEMENT Date   (#PCDATA)> 
<!ELEMENT ISBN   (#PCDATA)> 
<!ELEMENT Publisher (#PCDATA)> 
<!ELEMENT Cost AS {document('BookCosts.xml')/BookCosts/Book[Title=$Title]/Cost} (#PCDATA)> 
<!ATTLIST  Cost store CDATA #REQUIRED>

The important part of this example is that an XPath expression for Cost tag specified in second last row. The docment function, which is defined in XSLT specification, is applied to extract some part of other XML data (BookCosts.xml) and combines it to the base BookCatalogue.xml. The XPath expression, BookCosts/Book/Cost, following after the document function specifies an extraction target tag that should be mapped to Cost tag. The [../Title=$Title] specifies a selection condition for Book tag meaning that a Title tag under the Book tag should be the same with $Title. $Title is a sibling tag of current Cost tag and is defined at third line of this XTL example.

XTL Specification

Appendix shows XTL syntax. The XTL feature and its extensions to DTD are as follows.

Output structure is specified with DTD

The DTD in XTL specifies an output structure of transformation. Therefore, it is easy for users to understand and specify the transformation result.

Embedded XPath clause into DTD

XPath clause is embedded into DTD syntax for each element or attribute. This clause indicates a mapping rule from input XML to output XML and there are two types of mapping rule: 1) node map and 2) value map.

  1. node map: This maps an input node (element or attribute) to an output node. In addition, this defines a current context of XPath for child nodes. Basically, XPath is relative expression to either its parent tag's XPath or its outer parenthesis's XPath. For example, <!ELEMENT result AS {bib} (bo* AS {book})> means that the result element is mapped from bib and set a current context of XPath to bib. So the next XPath book corresponds to bib/book. The other example <!ELEMENT result AS {bib} (bo AS {.})* AS {book})> has the same meaning with the previous XTL in different expression. The output element bo's XPath corresponds to bib/book/. in this case.
  2. value map: This maps a value to an output text node (expressed using PCDATA for element or any attribute). For example, <!ELEMENT count (#PCDATA AS {count(author)})> means that the count element value is set with a result of count(author).

It is possible to omit the XPath clause for element or attribute. Default rule of this is that the same tag with output XML tag is used as XPath expression for node map and text() is used as XPath expression for value map. For example, <!ELEMENT bib AS {bib} (book*)> has the same meaning with <!ELEMENT bib AS {bib} (book* AS {book})> and <!ELEMENT book (#PCDATA)> has the same meaning with <!ELEMENT book (#PCDATA AS {text()})>.

DTD Extension

To increase a flexibility of specifying output structure, DTD syntax is extended as follows.

  1. Mixed content model extension enables to handle #PCDATA and element in same way. Therefore, XTL can express like, <!ELEMENT paper (#PCDATA, title)> and <!ELEMENT paper (title | #PCDATA)>.
  2. ORDER BY clause sorts a collection of specific element using specified key node. For example, <!ELEMENT bib AS {bib} (author* IN {//author} ORDER BY {.})> means that a collection of author should be sorted using deep equality of itself.
  3. GROUP BY eliminates duplicated elements using specified key node. For example, <!ELEMENT bib AS {bib} (author* IN {//author} GROUP BY {.})> means that a collection of author should not contain a same author using deep equality of itself.
  4. Tag variable enables to map any input tag to keep the same tag name. For example, <!ELEMENT bib AS {bib} ($a AS {paper|article})> means that $a is a tag variable and the input tag name (paper or article) is mapped to the result tag name.

Cardinality Constraint

In DTD, it is possible to specify an element cardinality (*, +, ?, none) in a content model. DTD in XTL specifies an output structure of transformation. XTL defines that the cardinality of element is a constraint for its mapping rule. For example, <!ELEMENT bib AS {bib} (book* AS {book})> indicates that XPath book should return a collection of book that count is more than zero. <!ELEMENT bib AS {bib} (book+ AS {book})> indicates that XPath book should return a collection of book that count is more than one.

XTL also has to define a meaning of ANY and EMPTY in XTL. ANY means that the element can have any content model (structure) without any constraint. EMPTY means that the element should be empty element. For example, <!ELEMENT author AS {author} ANY> indicates that output author can be any content model and all sub-elements of input author is mapped to the transformation result. On the other hand <!ELEMENT author AS {author} EMPTY> indicates that the input author should be empty element.

Variable binding

A variable is automatically set for each element and attribute. Any embedded XPath can refer those variables as long as the referee node assigned to variable is reachable from a referrer via many-to-one or one-to-one association in DTD graph. For example, <!ELEMENT b AS {bib} (a * {$bib/author})> indicates that $b refers input bib element. If XTL has a recursive element definition, then a closest parent is referenced from its children.

XSLT generator for XTL

I have implemented XTL processor to translate XTL expression to XSLT exprssion. This generator is useful for users who would like to transform XML, because XTL is much simpler than XSLT. They can use this generator as a front-end tool of XSLT.

This processor doses not implement the XTL specification fully, because there are some difficulties to translate XTL to XSLT that comes from model gaps between XTL and XSLT. However, this generator translates most of XTL into XSLT so users can use this generator for many XML transformations.

Translation from XTL to XSLT

This section briefly describes how the XTL processor translates XTL expression to XSLT expression. The details will come in following section.

First of all, let us remind the XSLT expression. XSLT expression is composed of several template declarations. XSLT processor inputs an XML and checks whether there are some templates that match an input element, attribute, or text. If some template matches, then it is applied and executed. If there are no matching templates, then default action is applied and output the input text.

Template generation

Basically, each element type declaration is translated to one XSLT template declaration. There are two patterns regardless of its content model:

  1. XPath is specified for a declared element like <!ELEMENT bib AS {bib} (author*)>.
  2. XPath is not specified for a declared element like <!ELEMENT bib (author*)>.

There are two cases that users must use the first pattern: 1) the element is a candidate of root element or 2) the element is a candidate of a descendant of an element declared as ANY. If not, user can use either the first or second pattern. This is a little bit confusing but this reduces the number of template declaration that this XTL processor generates. The first pattern element declaration is translated into two template declarations like <xsl:template match="bib">... for 1) case and <xsl:template match="bib" mode="any">... for 2) case. The second pattern element declaration is translated into a template declaration like <xsl:template match="*|@*|text()" " mode="bib">.... The declared element name is used for a mode name in a translated template declaration.

There are additional three types of template declarations that XTL processor generates:

  1. For an element specified with GROUP BY clause in content model.
  2. To avoid invoking default action (if there is no template applied to a node, then its text is outputted) of XSLT.
  3. For a processing for any node that is a descendant of an element specified with ANY.

In case a), a template declaration like <xsl:template match="*|@*|text()" mode="author_.">... is generated for each element specified with GROUP BY and this is for the purpose to eliminate duplication of specified node. The mode name is generated as a concatenation of target element name, string "_", and key element name of GROUP BY clause.

In case b), a template declaration <xsl:template match="text()"></xsl:template> is generated and indicates that any input node (element, attribute, or text) outputs nothing if it dose not match any other template declarations.

In case c), a template declaration:

<xsl:template match="*|@*|text()" mode="any">
  <xsl:copy>
    <xsl:apply-templates select="*|@*|text()" mode="any"/>
  </xsl:copy>
</xsl:template>
is generated for default action for ANY and indicates that an input node is copied recursively including its sub-structure as long as there are no other template declaration like <xsl:template match="bib" mode="any"> that overwrites this template for default action.

Details of translation

I have described how the XTL processor generates each template declaration without inside details of the template declaration. Let us look into its inside using an example. The example XTL is:

Figure 15: Sample query
<!ELEMENT bib AS {bib} (author* AS {//author} GROUP BY {.})> 
<!ELEMENT author (firstname?, lastname, 
                  title* AS {//title[../author=$author]} ORDER BY {.})> 
<!ELEMENT title (#PCDATA)> 
<!ATTLIST  title year AS {../@year} CDATA #REQUIRED> 
<!ELEMENT firstname ANY> 
<!ELEMENT lastname  EMPTY>

and the generated XSLT result is:

Figure 16: XSLT generated
<?xml version="1.0" encoding="euc-jp"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" encoding="euc-jp" omit-xml-declaration="no"/>
<xsl:strip-space elements="*"/>
 
<!-- This is for bib element -->
<xsl:template match="bib">
  <xsl:variable name="bib" select="."/>
  <xsl:element name="bib">
    <xsl:apply-templates select="(//author)" mode="author_.">
      <xsl:with-param name="nodes" select="//author"/>
      <xsl:with-param name="bib" select="$bib"/>
    </xsl:apply-templates>
  </xsl:element>
</xsl:template>
 
<!-- This is for bib element UNDER ANY -->
<xsl:template match="bib" mode="any">
  <xsl:variable name="bib" select="."/>
  <xsl:element name="bib">
    <xsl:apply-templates select="(//author)" mode="author_.">
      <xsl:with-param name="nodes" select="//author"/>
      <xsl:with-param name="bib" select="$bib"/>
    </xsl:apply-templates>
  </xsl:element>
</xsl:template>
 
<!-- This is for author element -->
<xsl:template match="*|@*|text()" mode="author">
  <xsl:variable name="author" select="."/>
  <xsl:param name="bib"/>
  <xsl:if test="count(lastname)>=1">
    <xsl:if test="lastname=''">
      <xsl:element name="author">
        <xsl:apply-templates select="(firstname)[1]" mode="firstname">
          <xsl:with-param name="author" select="$author"/>
          <xsl:with-param name="bib" select="$bib"/>
        </xsl:apply-templates>
        <xsl:apply-templates select="(lastname)[1]" mode="lastname">
          <xsl:with-param name="author" select="$author"/>
          <xsl:with-param name="bib" select="$bib"/>
        </xsl:apply-templates>
        <xsl:apply-templates select="(//title[../author=$author])" mode="title">
          <xsl:with-param name="author" select="$author"/>
          <xsl:with-param name="bib" select="$bib"/>
          <xsl:sort select="." order="ascending"/>
        </xsl:apply-templates>
      </xsl:element>
    </xsl:if>
  </xsl:if>
</xsl:template>
 
<!-- This is for firstname element -->
<xsl:template match="*|@*|text()" mode="firstname">
  <xsl:variable name="firstname" select="."/>
  <xsl:param name="author"/>
  <xsl:param name="bib"/>
  <xsl:element name="firstname">
    <xsl:apply-templates select="child::node()" mode="any"/>
  </xsl:element>
</xsl:template>
 
<!-- This is for lastname element -->
<xsl:template match="*|@*|text()" mode="lastname">
  <xsl:variable name="lastname" select="."/>
  <xsl:param name="author"/>
  <xsl:param name="bib"/>
  <xsl:if test=".=''">
    <xsl:element name="lastname">
    </xsl:element>
  </xsl:if>
</xsl:template>
 
<!-- This is for title element -->
<xsl:template match="*|@*|text()" mode="title">
  <xsl:variable name="title" select="."/>
  <xsl:param name="author"/>
  <xsl:param name="bib"/>
  <xsl:if test="count(../@year)>=1">
    <xsl:element name="title">
      <xsl:variable name="year" select="../@year"/>
      <xsl:if test="count($year)>=1">
        <xsl:attribute name="year">
          <xsl:value-of select="$year"/>
        </xsl:attribute>
      </xsl:if>
      <xsl:value-of select="."/>
    </xsl:element>
  </xsl:if>
</xsl:template>
 
<!-- This is for GROUP author element BY . -->
<xsl:template match="*|@*|text()" mode="author_.">
  <xsl:param name="nodes"/>
  <xsl:param name="bib"/>
  <xsl:variable name="pos" select="position()"/>
  <xsl:if test="count($nodes[$pos>position() and .=current()/.])=0">
    <xsl:apply-templates select="." mode="author">
      <xsl:with-param name="bib" select="$bib"/>
    </xsl:apply-templates>
  </xsl:if>
</xsl:template>
 
<xsl:template match="text()"></xsl:template>
 
<xsl:template match="*|@*|text()" mode="any">
  <xsl:copy>
    <xsl:apply-templates select="*|@*|text()" mode="any"/>
  </xsl:copy>
</xsl:template>
 
</xsl:stylesheet>

First, there are three types of content model in element type declaration: ANY, EMPTY, and element content. There are mixed content in original DTD, but there is no mixed content in DTD of XTL because DTD of XTL is extended to handle #PCDATA and element in the same way (meaning that mixed content is combined with element content).

ANY

Template declaration is generated according to the rules described in 5-1. For example, <!ELEMENT firstname ANY> is translated into a template:


<xsl:template match="*|@*|text()" mode="firstname">
  <xsl:variable name="firstname" select="."/>
  <xsl:param name="author"/>
  <xsl:param name="bib"/>
  <xsl:element name="firstname">
    <xsl:apply-templates select="child::node()" mode="any"/>
  </xsl:element>
</xsl:template>
There come some variables and parameters for elements that I will explain later in “Variable binding”. Then, element creation (expressed using <xsl:element name="firstname">) comes with applying all child nodes to some templates whose mode name is "any". This means that all descendant nodes under ANY specified element should be outputted. There are two types templates that are applied from the above template: 1) outputs descendant nodes a same structure with its input, 2) transforms descendant nodes according to an element type declaration that users specify in XTL expression.

  1. The following template declaration is generated for processing descendant nodes of an ANY specified element (firstname). This is for a default action and outputs a same structure with its input.
    <xsl:template match="*|@*|text()" mode="any">
      <xsl:copy>
        <xsl:apply-templates select="*|@*|text()" mode="any"/>
      </xsl:copy>
    </xsl:template>
    
    Any node that matches this template is copied (expressed using <xsl:copy>) and all child nodes are applied to this template again as long as users dose not specify an element type declaration for that node in XTL expression.
  2. If users specify an element type declaration with XPath for the declared element (this is for the purpose of overwriting the above 1) default action), a template declaration is generated for that purpose. For example, the above <!ELEMENT bib AS {bib} (author* AS {//author} GROUP BY {.})> generates a template:
    <xsl:template match="bib" mode="any">
      <xsl:variable name="bib" select="."/>
      <xsl:element name="bib">
        <xsl:apply-templates select="(//author)" mode="author_.">
          <xsl:with-param name="nodes" select="//author"/>
          <xsl:with-param name="bib" select="$bib"/>
        </xsl:apply-templates>
      </xsl:element>
    </xsl:template>
    
    Its internal process is the same with a template <xsl:template match="bib> that is described in “Element content”.

This function is powerful especially for such case that users want to transform only a small part of structure or a tag name of a node in an input XML. The users can declare a root element as ANY and specify whatever they want to transform using element type declaration or attribute list declaration.

EMPTY

Template declaration is generated according to the rules described in “Query feature”.For example, <!ELEMENT lastname EMPTY> is translated into a template:

<xsl:template match="*|@*|text()" mode="lastname">
  <xsl:variable name="lastname" select="."/>
  <xsl:param name="author"/>
  <xsl:param name="bib"/>
  <xsl:if test=".=''">
    <xsl:element name="lastname">
    </xsl:element>
  </xsl:if>
</xsl:template>
The variables and parameters are processed same with those of in ANY. Then, a condition comes to check the mapped element empty. If the condition is satisfied, an element name is created (expressed using <xsl:element name="lastname">).

Element content

Template declaration is generated according to the rules described in 5-1. For example, <!ELEMENT title (#PCDATA)> is translated into a template:

<xsl:template match="*|@*|text()" mode="title">
  <xsl:variable name="title" select="."/>
  <xsl:param name="author"/>
  <xsl:param name="bib"/>
  <xsl:if test="count(../@year)>=1">
    <xsl:element name="title">
      <xsl:variable name="year" select="../@year"/>
      <xsl:if test="count($year)>=1">
        <xsl:attribute name="year">
          <xsl:value-of select="$year"/>
        </xsl:attribute>
      </xsl:if>
      <xsl:value-of select="."/>
    </xsl:element>
  </xsl:if>
</xsl:template>
If the declared element has some element in content model or attribute, the generated template should check their cardinality constraint before applying a template for child elements. The detail of the cardinality constraint is described in Cardinality constraint.

A generated template declaration varies according to a content model of an element type declaration in XTL expression: 1) sequence (A,B,...), or 2) choice (A|B|...).

  1. sequence: The generated template should process each child node sequentially. For example, an element type declaration <!ELEMENT author (firstname?, lastname, title* AS {//title[../author=$author]} ORDER BY {.})> generates:
    <xsl:element name="author">
      <xsl:apply-templates select="(firstname)[1]" mode="firstname">
        <xsl:with-param name="author" select="$author"/>
        <xsl:with-param name="bib" select="$bib"/>
      </xsl:apply-templates>
      <xsl:apply-templates select="(lastname)[1]" mode="lastname">
        <xsl:with-param name="author" select="$author"/>
        <xsl:with-param name="bib" select="$bib"/>
        </xsl:apply-templates>
      <xsl:apply-templates select="(//title[../author=$author])" mode="title">
        <xsl:with-param name="author" select="$author"/>
        <xsl:with-param name="bib" select="$bib"/>
        <xsl:sort select="." order="ascending"/>
        </xsl:apply-templates>
    </xsl:element>
    
    There are three xsl:apply-templates for each child node processing (firstname, lastname, and title).
  2. choice: The generated template should process each child node conditionally. For example, an element type declaration <!ELEMENT author (A|B|C)> generates:
    <xsl:element name="author">
      <xsl:choose>
        <xsl:when test="A">
          <xsl:apply-templates select="(A)[1]" mode="A">
            <xsl:with-param name="root" select="$root"/>
          </xsl:apply-templates>
        </xsl:when>
        <xsl:when test="B">
          <xsl:apply-templates select="(B)[1]" mode="B">
            <xsl:with-param name="root" select="$root"/>
          </xsl:apply-templates>
        </xsl:when>
        <xsl:when test="C">
          <xsl:apply-templates select="(C)[1]" mode="C">
            <xsl:with-param name="root" select="$root"/>
          </xsl:apply-templates>
        </xsl:when>
      </xsl:choose>
    </xsl:element>
    
    There are three when for each child node processing (A, B, and C).

When a content model is nested, each nested part (put in parenthesis) can be specified with XPath and behaves the same with when it is declared as a content model in an element declaration, except it has no specific name for the nested part. For example, <!ELEMENT author (A|B|(C1,C2)*)> behaves like combination of <!ELEMENT author (A|B|foo*)> and <!ELEMENT foo (C1,C2)>. To avoid generating such temporal element like foo, the XTL processor generates <xsl:for-each> instead of generating <xsl:apply-templates>. For example, <!ELEMENT author (A|B|(C1,C2) AS {C})> generates:

<xsl:when test="C">
  <xsl:for-each select="C">
    <xsl:apply-templates select="(C1)[1]" mode="C1">
      <xsl:with-param name="author" select="$author"/>
      <xsl:with-param name="bib" select="$bib"/>
    </xsl:apply-templates>
    <xsl:apply-templates select="(C2)[1]" mode="C2">
      <xsl:with-param name="author" select="$author"/>
      <xsl:with-param name="bib" select="$bib"/>
    </xsl:apply-templates>
  </xsl:for-each>
</xsl:when>

GROUP BY clause

A template for an element whose content model contains a GROUP BY clause (I will refer the declared element as parent element for GROUP BY element) invokes xsl:apply-templates to apply a special template for exclusion. I will describe concerning about the special template later. For example, the template for bib element invokes xsl:apply-templates to apply the special template <xsl:apply-templates select="(//author)" mode="author_."> instead of directly invoking a template for author element. In addition, it passes a node-set as a parameter that is the same with a selected node-set for applying the special template. For example, the template for bib element passes a result of //author as a parameter nodes.

Let us look at the special template for exclusion using an example. <!ELEMENT bib AS {bib} (author* AS {//author} GROUP BY {.})> generates both a template for bib element and a template:

<xsl:template match="*|@*|text()" mode="author_.">
  <xsl:param name="nodes"/>
  <xsl:param name="bib"/>
  <xsl:variable name="pos" select="position()"/>
  <xsl:if test="count($nodes[$pos>position() and .=current()/.])=0">
    <xsl:apply-templates select="." mode="author">
      <xsl:with-param name="bib" select="$bib"/>
    </xsl:apply-templates>
  </xsl:if>
</xsl:template>
to eliminate duplication of author element. Its mode name is concatenated with a target element name and a key node name of DISTINCT clause. To eliminate duplication, the XTL processor checks whether there is some previously processed node that has the same value with a current node. If there is, it skips the current node processing. If not, it does the current node processing. For example, the generated XSLT, <xsl:if test="count($nodes[$pos>position() and .=current()/.])=0">, means a condition that there should be no such node in nodes variable (this is author element) whose position is before that of current node and it has a same value with the current node.

ORDER BY clause

If ORDER BY clause is specified for an element in a content model, xsl:sort is added as a content of xsl:apply-templates that applies a template for the element. For example,

<!ELEMENT author (firstname?, lastname, title* AS {//title[../author=$author]} ORDER BY {.})> 
generates
<xsl:apply-templates select="(//title[../author=$author])" mode="title">
  <xsl:with-param name="author" select="$author"/>
  <xsl:with-param name="bib" select="$bib"/>
  <xsl:sort select="." order="ascending"/>
</xsl:apply-templates>
There is additional <xsl:sort select="." order="ascending"/>.

Cardinality constraint

The XTL processor checks a cardinality constraint (*, +, ?, and none in DTD) as follows. It checks such node should be more than one that is transitively reached via + or none cardinality in DTD. If there is * or ? cardinality, the XTL processor stops checking that its descendant nodes should be more than one. It doesn't check node should be one or zero in case of ? cardinality or should be exactly one in case of none cardinality, because XPath expression like /bib/author sometimes means a top node of a node-set /bib/author. Another implementation candidate would be that a XTL processor checks node should be one or zero in case of ? or none cardinality and users have to be responsible to express an XPath like /bib/author[1].

Variable binding

A declared element name is defined as variable in its translated template declaration. For example, current node is defined as variable like <xsl:variable name="bib" select="."/> in a template for bib element. This variable is passed as a parameter of apply-templates to refer it as variable from applied template declarations (these templates are for child elements processing). For example, <xsl:with-param name="bib" select="$bib"/> is an expression for parameter. As a result of this, any parent elements are referable as variables from any template declarations.

Ideally, any node, that is reachable from current node via one-to-one or many-to-one association in DTD graph, should be referable as variables. However the XTL processor dose not support it fully (supported is only from child node to parent node), because it is difficult to map such reference in XSLT expression.

Attribute list declaration

Attribute list declaration for an element is translated to XSLT expression in a translated template declaration for the element. For example, <!ATTLIST title year AS {../@year} CDATA #REQUIRED> is translated into:

<xsl:variable name="year" select="../@year"/>
<xsl:if test="count($year)>=1">
  <xsl:attribute name="year">
    <xsl:value-of select="$year"/>
  </xsl:attribute>
</xsl:if>
<xsl:value-of select="."/>
in a template for element title. Being the same with element translation, there come variable definitions and cardinality constraint check inside of which attribute creation (expressed using <xsl:attribute name="year">) comes with its value (expressed using <xsl:value-of select="$year"/>).

Limitations

There are several limitations in this translation algorithm from XTL expresion to XSLT exprssion.

You cannot use both ORDER BY clause and GROUP BY clause for the same element simultaneously. You have to divide the transformation into two XTL, one is for GROUP BY and the other is for ORDER BY.

Only parent elements are referable as variables.

Tag variable is not available.

User-defined function is not available.

Comparison with Quilt

A query or transformation language has two functions: querying from input XML and defining output structure.

Query feature

Even though both XTL and Quilt query feature are based on XPath, XTL Querying feature is inferior to that of Quilt. Because Quilt has more additional functionality like FOR, WHERE clause and AFTER, BEFORE operators. They are good supplement of XPath function.

  1. FOR clause: It declares any variables and that sometimes simplifies XPath expression by referring those declared variables.
  2. WHERE clause: It simplifies XPath expression and sometimes is more powerful than XPath expression especially when an XPath selection condition gets deeply nested.
  3. AFTER, BEFORE: It simplifies XPath expression and sometimes is more powerful than XPath expression in specifying some order of nodes.

Basically XPath issues may motivate these functions. I hope XPath 2.0 will improve some existing issues of XPath1.0.

Define output structure feature

XTL defining output structure feature is superior to that of Quilt because XTL can define an output structure based on grammar (extended DTD). Quilt's output structure is base on XML instance so it gets rather complex to specify choice (| in DTD) and some recursive structure. In addition, ANY in XTL is powerful to transform a small part of an input XML data, because users have to specify only a part where a transformation is needed (In Quilt, users have to specify s whole structure from root node including a part where a transformation is not needed).

XTL expressions written in Quilt

I will describe examples of 2. Examples of XTL expressions written in Quilt. These examples are for the purpose of clarify the difference of XTL and Quilt.

Projection while preserving structure

Figure 17: Quilt projection (1)
<bib>
  (
  FOR     $a IN document("bib.xml")/bib/book
  RETURN  <book>$a</book>
  )
</bib>
Figure 18: Quilt projection (2)
<bib>
  (
  FOR       $a IN document("bib.xml")/bib
    (
    FOR     $b IN $a/book | $a/article
    RETURN
       <book Year=$b/year/text()>
         <title>$b/title/text()</title>
           (
           FOR $c IN $b/author
           RETURN
             <author>
               <author_name>concat($c/firstname/text(), ' ',$c/lastname/text())</author_name>
             </author>
           )
         <publisher>$b/publisher/text()</publisher>
       </book>
    )
  )
</bib>

Selection

Figure 19: Quilt selection
<bib>
  (
  FOR      $a IN document("bib.xml")/bib/book
  WHERE    contains($a/title, "XML") AND $a/@year >1995
  RETURN
    <book>$a</book>
  )
</bib>

Rename

Figure 20: Quilt rename
<Bib>
  (
  FOR       $a IN document("bib.xml")/bib
    (
    FOR     $b IN $a/book | $a/article
    RETURN
      <Book Year=$b/year/text()>
         <Title>$b/title/text()</Title>
            (
            FOR $c IN $b/author
            RETURN
             <Author>
               <Firstname>$c/firstname/text()</Firstname>,
               <Lastname>$c/lastname/text()</Lastname>
             </Author>
            )
          <Publisher><name>$b/publisher/name/text()</name></Publisher>
      </Book>
    )
  )
</Bib>

Changing Structure

Figure 21: Quilt changing structure (1)
<bib>
  (
  FOR       $a IN DISTINCT document("bib.xml")//author
  RETURN    $a
  )
</bib>
Figure 22: Quilt changing structure (2)
<bib>
  (
  FOR            $a IN DISTINCT document("bib.xml")//author
  RETURN
    <author>
      <firstname>$a/firstname</firstname>,
      <lastname>$a/lastname</lastname>,
        (
        FOR      $b IN document("bib.xml")//title[../author=$a]
        RETURN   $b/title
        )
    </author>
  )
</bib>

Sort

Figure 23: Quilt sort
<bib>
  (
  FOR       $a IN DISTINCT document("bib.xml")//author
  RETURN    $a SORTBY(.)
  )
</bib>

Selection

Figure 24: Quilt selection (1)
<bib>
  (
  FOR       $a IN DISTINCT document("bib.xml")//author
  RETURN    $a SORTBY(.) DESCENDING
  )
</bib>
Figure 25: Quilt selection (2)
<bib>
  (
  FOR         $a IN DISTINCT document("bib.xml")//author
  RETURN      $a SORTBY(.)          
 // I am not sure this SORTBY posistion is correct
      (
      FOR     $b IN document("bib.xml")//title[../author=$a]
      RETURN  $b/title SORTBY(title)
      )
  )
</bib>

Join

Figure 26: Quilt join
<bib>
  (
  FOR         $a IN document("BookCatalogue.xml")/Book
  RETURN    
      <book>
         $a/Title
         (
            FOR    $c IN $a/Auhtor
            RETURN $c
         )
         $a/Date
         $a/ISBN
         $a/Publisher
         (
            FOR    $b IN document("BookCosts.xml")/BookCosts/Book[Title=$a/Title]/Cost
            RETURN $b
         )
      </book>
  )
</bib>

Appendix: XTL Syntax

I describe syntax of XTL using BNF. This BNF is based on DTD in XML specification and extends DTD's BNF according to the extension described in Section 3.1.

Figure 27: BNF syntax (restruct2.xsl)
xtl ::= (markupdecl)* 

// XTL does not support EntityDecl, NotationDecl, and PI 
markupdecl  ::= elementdecl | AttlistDecl | Comment 

// element type declaration 
elementdecl ::= '<!ELEMENT' Name XPath? contentspec '>' 

// for tag variable and handling #PCDATA same with element.
Name        ::=  (Letter | '_' | ':') (NameChar)* | '$' (NameChar)* | #PCDATA

NameChar  ::=  Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender 

XPath    ::=  'AS' nodesetExpr

nodeSetExpr:=
           nodeSetExpr '+' nodeSetExpr /* concatination */
      |    nodeSetExpr '-' nodeSetExpr /* difference */
      |    nodeSetExpr '*' nodeSetExpr /* intersection */
      |    '(' nodeSetExpr ')'
      |    XPathExpr

contentspec ::= 'EMPTY' | 'ANY' | children 

children    ::= (choice | seq) ('?' | '*' | '+')? 

choice      ::= '(' cp ( '|' cp )* ')' 

seq         ::= '(' cp ( ',' cp )* ')' 

cp          ::= (Name XPath? | choice | seq) ('?' | '*' | '+')? groupby? orderby? 

groupby ::= 'GROUP' 'BY' XPath+ 

orderby ::= 'ORDER' 'BY' XPath+ 

// attribute list declaration 
AttlistDecl ::= '<!ATTLIST' Name AttDef* '>' 

AttDef      ::= Name XPath? AttType DefaultDecl 

AttType     ::= StringType | TokenizedType | EnumeratedType  

StringType  ::= 'CDATA' 

TokenizedType::= 'ID'|'IDREF'|'IDREFS'|'ENTITY'|'ENTITIES'|'NMTOKEN'|'NMTOKENS 

// XTL does not support Notation Type 
EnumeratedType::=  Enumeration  
                                                                            [  
Enumeration ::= '(' Nmtoken ('|' Nmtoken)* ')' 

Nmtoken     ::= (NameChar)+ 

DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED')? AttValue) 

AttValue    ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" 

Reference   ::= CharRef 

CharRef     ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';' 

// comment declaration 
Comment     ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

Bibliography

[A query language for XML] Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, Dan Suciu, In International World Wide Web Conference, 1999. http://www.research.att.com/~mff/files/final.html.

[Comparative Analysis of Five XML Query Language] Angela Bonifati, Stefano Ceri, SIGMOD Record, vol.29, no.1, March 2000.

[Querying XML Data] Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, David Maier, Dan Suciu, IEEE Data Engineering Bulletin vol. 22, no.3, 10-18, 1999.

[Quilt: an XML Query Language] Jonathan Robie, Don Chamberlin, Daniela Florescu, http://www.almaden.ibm.com/cs/people/chamberlin/quilt_euro.html.

[XML Path Language (XPath) Version 1.0] W3C, http://www.w3.org/TR/xpath

[XML Query Languages: Experiences and Exemplars] Mary Fernandez, Jerome Simeon, Philip Wadler, http://www.w3.org/1999/09/ql/docs/xquery.html.

[XPath Language Requirement Version 2.0] W3C, http://www.w3.org/TR/xpath20req

[XQL FAQ] Jonathan Robie, http://metalab.unc.edu/xql/

[XSL Transformations (XSLT) Version 1.0] W3C, http://www.w3.org/TR/xslt.

[XSL Transformations (XSLT) Version 1.1] W3C, http://www.w3.org/TR/xslt11.



XTL: An XML Transformation Language and XSLT generator for XTL

Makoto Onizuka [Research Engineer, NTT CyberSpace Laboratories]
oni@acm.org