On mapping from colloquial XML to RDF using XSLT

C. M. Sperberg-McQueen
Eric Miller

Abstract

XML vocabularies can be characterized as those designed for the convenience of authors or software developers, called colloquial, and those designed to have a trivial mapping to a non-XML data structure, which we call non-colloquial. Mapping colloquial vocabularies into other formats (e.g., symbolic logic or RDF) is a powerful tool for making colloquial XML tractable. Specifying this mapping is a way of documenting what the elements and attributes are supposed to mean and how they are to be used. If this is done only in English prose, humans can make use of it, but not machines. If machine-readable syntax is used to specify a mapping from the XML vocabulary into some well-known target syntax, the mapping can benefit both humans and machines. Simple examples illustrate how mappings can be defined using XSLT and how they can be attached to the schema defining the XML vocabulary.

Keywords: RDF; XSLT; Mapping

C. M. Sperberg-McQueen

C.M. Sperberg-McQueen is a member of the technical staff at the World Wide Web Consortium; he chairs the W3C XML Schema Working Group and XML Coordination Group.

Eric Miller

Eric Miller is the Activity Lead for the W3C World Wide Web Consortium's Semantic Web Initiative.

His responsibilities include the architectural and technical leadership in the design and evolution of Semantic Web infrastructure. Responsibililities additionally include working with W3C Working Group members so that both working groups in the Semantic Web activity, as well as other W3C activities, produce Web standards that support Semantic Web requirements. Additionally, to build support among user and vendor communities for the Semantic Web by illustrating the benefits to those communities and means of participating in the creation of a metadata-ready Web. And finally to establish liaisons with other technical standards bodies involved in Web-related technology to ensure compliance with existing Semantic Web standards and collect requirements for future W3C work.

Before joining the W3C, Eric was a Senior Research Scientist at OCLC Online Computer Library Center, Inc. and the co-founder and Associate Director of the The Dublin Core Metdata Initiative, an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models.

Eric holds a Research Scientist appointment at MIT's Laboratory for Computer Science.

On mapping from colloquial XML to RDF using XSLT

C. M. Sperberg-McQueen [World Wide Web Consortium, MIT Computer Science and AI Laboratory]
Eric Miller [World Wide Web Consortium, MIT Computer Science and AI Laboratory]

Extreme Markup Languages 2004® (Montréal, Québec)

Copyright © 2004 C. M. Sperberg-McQueen and Eric Miller. Reproduced with permission.

Introduction

Let us begin by trying to make explicit some assumptions we are making which may or may not be fully shared by others.

The mapping problem

An application of XML or SGML defines what some people call a markup language, and other people would prefer to refer to as a markup vocabulary or namespace. Since some people prefer to reserve the term markup language for meta-languages like XML and SGML, the following discussion will use the term vocabulary — without, however, intending to obscure the fact that the XML-based applications in question do have rules that go beyond the provision of names and may be captured in whole or in part by syntactic formalisms.

Many people (vocabulary designers, schema and DTD authors, application developers, people trying to make it easier to work with documents in markup languages designed by others, and no doubt others, too) wish to say, for specific constructs in a vocabulary, what they mean. By the constructs of a vocabulary we mean primarily the element( type)s, attributes, notations, processing-instruction targets, and entities defined in that vocabulary; in some cases, it is convenient also to include simple or complex datatypes and substitution groups (as in XML Schema 1.0), non-terminals (as in Relax and Relax NG), classes (as in the ODD system used to generate the Text Encoding Initiative DTDs), or other abstractions under this term.

Some of those who wish to say what markup constructs mean wish to do so using some machine-processable notation; others would be happy with better tools for human-understandable documentation. We are here concerned mostly with the former, though good rules for machine-processable specification of meaning may also help make meaning clear to humans.

Two difficulties attend any effort say what markup constructs mean. First of all, different people have very different ideas of what would be involved. And second, if such an attempt is not to remain a purely individual mental exercise, the results must be written down or spoken in some language with its own syntactic rules. What may have started as an attempt to focus on semantics to the exclusion of syntax thus concludes by looking like just another translation from one syntax to another. Let us examine these two difficulties in more detail.

First, different people have very different ideas of what it would mean, for the constructs of a vocabulary, to say what they mean. For purposes of discussion, we identify five. The first four are all mapping problems in one way or another:

  • Some people mean by this that they wish to be able to specify how data structures internal to some application software are serialized as XML, or how XML is de-serialized into data structures; questions like "When does an element become an object of class Foo, and when does it become an object of class Foobar?", asked with reference to some set of object classes defined in some programming language, are central to their concerns. Call this the concrete data-structure mapping problem. An example of this approach is [Krupnikov/Thompson 2001].
  • Others wish to specify how to map XML document instances into columns, rows, and tables in some SQL database management system; sometimes they wish to specify a mapping into new rows of existing tables, and sometimes what is needed is a mapping which would specify which new tables to create. Call this the abstract data structure mapping problem. It differs from the concrete data structure mapping problem as the abstraction of a SQL table differs from the various programming-language constructs which might be used to implement the abstraction. See [Vorthmann/Buck 2000a], [Vorthmann/Buck 2000b].
  • Still others wish to specify a mapping into first-order predicate calculus as a way of defining the correct interpretation of markup. Call this the FOPC mapping problem. Cf. [Sperberg-McQueen/Huitfeldt/Renear 2001a].
  • Some wish to map arbitrary XML into RDF.1 Call this the RDF mapping problem. See for example [Hazaël-Massieux/Connolly 2004].
These four mapping problems seem to cover the most frequently discussed ground among those interested primarily in machine-processable descriptions of meaning, but we have no proof that the classification is necessarily exhaustive, and nothing in the further argument requires that it be exhaustive.

Some people believe that the four mapping problems described above do not necessarily have much in common. Others believe that all of them are at root ‘the same thing’. Mostly, they seem to mean by this that if a formalism is provided for what they wish to do, they believe that everyone else's requirements will be met. They do not, in general, seem to mean that if anyone else's requirements are met, they will be able to do what they wish to do.

The four mapping problems identified above do have in common that they involve defining a meaning-preserving mapping from XML notation into some other model. Let us call this other model the target model. If the target model has a syntax in which it can be serialized, let us call that syntax the target formalism. We will be concerned only with target models which can be serialized in this way; it may be possible to extend our proposals to some models with non-serial notations, but not to ineffable models (those to which no notation at all is adequate).

If the target model has a corresponding target formalism, then all four of the mapping problems can be conceived of as involving the translation of information from one syntax (XML) into some other syntax. A mapping problem may thus be conceived of as a syntax-to-syntax translation even if, in practice, the result desired is not a string of characters denoting some abstraction, but some other representation of the abstraction (such as an in-memory data structure).

A fifth idea of what it means to describe the meaning of a vocabulary should also be mentioned:

  • Some wish to communicate enough information about each construct in a vocabulary to other human beings to enable them to recognize and use the elements and attributes correctly. Call this the documentation problem.
The documentation problem is known to be soluble, but the solution is not easy: intelligent humans must write clear natural-language descriptions of the vocabulary, and attentive humans must read them and interpret them correctly. This is straightforward but not automatable. Numerous vocabularies for describing markup vocabularies have been developed and used over the years; their use may make the construction of useful, fairly complete documentation easier, but they cannot make it mechanical. Nothing in this paper reduces the importance of the documentation problem or makes it any easier to solve.2

Any solution to the documentation problem produces, by definition, correct understanding in the part of a hearer or reader. Since such understanding is a pre-requisite to the creation of any meaning-preserving transformation or mapping, it may be noted that the solution of the documentation problem appears to be a prerequisite to any solution of any of the mapping problems, except where the mapping problems are solved by the original specifier of a vocabulary without the need for communication with any other humans. The converse is not true: solutions to the mapping problems are neither prerequisites nor necessary consequences of solutions to the documentation problem. Solving the mapping problems, on the other hand, would make it possible to perform more useful work with marked up data without involving the need for quite so many attentive human programmers. This might be advantageous because attentive human programmers are commonly in short supply.

The fifth idea of meaning brings us to the second difficulty identified above. Reducing the mapping problem to a translation problem operating at the level of syntax may trouble some, particularly those interested in the documentation problem. If we regard the realm of meanings as distinct in some way from that of utterances or syntax, we are bound to be disappointed in a mapping-oriented solution, because the solution seems to shift ground from the ethereal to the mundane. But strictly speaking, any useful formulation of semantics is reducible in this way to a problem operating at the level of syntax. Any attempt to say what markup means necessarily involves constructing some utterance in some perceivable form. That utterance can only be described and interpreted in terms of some syntax: without syntax, all meanings are ineffable.

The involvement of the syntactic layer does not (pace some reviewers of this paper) render the mapping problems mentioned above meaningless, nor does it divorce them from meaning. The mapping problems are not solved by arbitrary mappings from XML into the target models, but only by mappings which retain the meaning of the original.3 (In specialized cases, it may suffice for practical purposes to capture only part of the meaning.) It is easy to dismiss these as merely pushing a bump in the rug from one location to another: having translated from XML into some other notation, are we not still faced with the task of specifying the meaning of that other notation? In cases where the target notation is as opaque to us as the original notation, the criticism has some justice. When the target notation is well understood, however, the translation does precisely what is needed. And we stress again: every successful explanation takes the form of translation from one syntax into another. The documentation problem is also a mapping problem and differs from the others only in substituting the syntax of English or French or some other natural language for the machine-processable target syntaxes of the other views. On the positive side, the fact that all specification of semantics is thus reducible to a problem in specifying syntactic transformations means that we can directly exploit without embarrassment the long history of work on mechanisms for syntax-driven transformations of marked up data.

Since in the long run every notation must be explained to be useful, it is an inescapable prerequisite for any useful work with marked up data that the documentation problem be solved for some notation or other. Since in the long run one of the main reasons for using markup is to reduce the need for human intervention in routine information procesing, however, solving the documentation problem alone will not suffice to allow us to exploit markup to full advantage. Hence this paper's emphasis on machine-processable target notations.

Colloquial XML and non-colloquial XML

Some applications of XML in use today obey strict rules for mapping XML constructs into constructs in some underlying data model non-isomorphic to XML. RDF is an example: every XML construct in an RDF data stream maps into an RDF triple, a part of a triple, or a set of triples, using relatively straightforward rules. Similarly, every XML construct in TEI feature system markup maps into a feature structure, a feature, a value of a feature, or a set of feature structures, following simple and unvarying rules (see [ACH/ACL/ALLC 1994] or [Langendoen/Simons 1995]). XML in Layman normal form, or in any of the normal forms distinguished by Henry Thompson [Thompson 2001] has a simple mapping into labeled graph structures.

Many applications of XML in use today emphasize convenience for authors or software developers over simplicity of the mapping to any underlying data model. Some applications do not specify any underlying model different from the basic XML data model of nodes in a tree, with arbitrary links expressed by ID/IDREF links or by application-level information. TEI, HTML, DocBook are examples of such applications. Following Noah Mendelsohn, we refer to the XML used by applications of this sort as colloquial XML. This allows us to suggest the term non-colloquial XML for XML whose structure is dictated by the desire to have a trivial mapping to a non-XML data structure.

Even in the case of non-colloquial XML, the mapping problems outlined above may be worth contemplation. Obviously, one will seldom need to solve the RDF mapping problem for information already in RDF (although it may be interesting to consider the problem of mapping RDF into different RDF with similar or identical semantics, e.g. as part of normalization). But mapping RDF into concrete or abstract data structures or into predicate calculus may easily become complex enough that it will be convenient to have tools to make the mapping easier to understand and specify. Similar considerations apply to all the output formats.

In the remainder of this paper, however, we focus on mapping colloquial XML into other formats, specifically logical form and RDF. Defining such mappings for colloquial XML helps clarify the intended semantics of the markup (at least for readers who can understand the target notations) and encourages vocabulary designers to be explicit about distinctions which matter for such semantically based mappings. And by mapping XML documents from specialized vocabularies into a common underlying data model, we can make it easier to merge information from multiple sources and to reuse the information represented in XML documents. At the moment, such merger and reuse requires intervention by humans who understand the semantics of the source markup, which may or may not be well documented and may or may not have been followed correctly by the data provider. Every step we can take toward making it easier to capture the meaning of markup vocabularies in machine-tractable form is a step toward better tools for performing such mergers, for managing evolution of vocabularies and data, and for building more robust systems. Mapping from arbitrary XML into semantically equivalent logical form or RDF is one such step.

The mapping problem as a schema annotation problem

Some people may take the view that the mapping problem is an illusion which only arises because XML vocabulary designers have fallen into bad habits owing to XML's lack of any binding semantic model. If designers would refrain from using colloquial idioms in XML, this line of reasoning goes, and if they would instead simply use a particular non-colloquial vocabulary or design their vocabularies within a particular non-colloquial meta-vocabulary, then there would be no mapping problem. Some (early) discussions of RDF seem to take this view fairly explicitly.

We do not share this view, because we believe that colloquial and non-colloquial XML have different strengths and are suitable for different uses. In particular, we note that many users of colloquial vocabularies are empirically unwilling to abandon them in favor of any of the non-colloquial vocabularies currently on offer.

Others may believe that the mapping problem (or, strictly speaking, its solution) is fundamentally a problem of language design and use. What is needed, on this account of things, is a language in which to describe the mapping from XML to the target model; the problem, in this view, is to design and use a language for describing mappings. Traces of this view can be found in the so-called “Cambridge Communiqué” [Swick/Thompson 1999] and in some discussions of it. From this point of view, the problem is simply: to design a language for use inside an XML Schema xsd:annotation element to specify the mapping of an XML vocabulary into a preferred semantic notation.

We are sympathetic to this view, but without being wholly committed to it. We note:

  • Since the mapping problems can be reduced to problems of translating information from one syntax to another, it is strictly speaking unnecessary to design a new language: existing languages for transforming XML documents into new XML vocabularies or into non-XML syntaxes can be used to specify mappings. [Ogbuji 2001] and [Hazaël-Massieux/Connolly 2004] illustrate this point.
  • Even if existing languages are more verbose, more general, and more complex than necessary or desirable for a pure mapping language, any design of a mapping language should be based on the analysis of actual mappings.

Getting a better grip on our information, knowing what it means and what it is and is not plausible to do with it, is an important part of building better information systems. The ability to make explicit more of the meaning of a vocabulary is useful in allowing members of distributed communities to work independently of each other, adding information to common resources and changing the form in which information is represented without requiring that all existing information stores be retrofitted with the new form of representation. More explicit semantics is also important in maintaining the integrity of information. The ability of a vocabulary designer or schema author to ‘annotate’ a schema document with information about how to map from the vocabulary into one or more chosen target syntaxes is the focus of this paper.

In this paper we walk through one simple example of mapping from a colloquial XML vocabulary into two non-colloquial notations: first-order predicate logic and RDF. We explore the use of XSLT as a language for specifying mappings from colloquial XML to logic and RDF, describe some methods for associating such mappings with XML Schema documents, and provide input into the design requirements of any future mapping language.

A simple example

In order to keep things simple, we choose first a very simple example of colloquial XML.

The vocabulary

The DTD

The source material we are interested in is an XML representation of a time log. The vocabulary is very simple; in its entirety, the DTD reads:

Figure 1: DTD for time log data [File timelog.dtd]
<!--* Timelog.dtd: record time periods spent, by project and category.
    * 
    * Revisions:
    * 2002-01-04 : made DTD to move existing data into XML
    *-->
<!--* To do:
    * allow paragraphs of prose annotation in TEI Lite or HTML.
    *   (this would require DTD modules for paragraphs and phrase-level
    *   elements. Might be a good test of DTD modularization)
    *-->

<!ENTITY % kw.DATE "NMTOKEN">
<!ENTITY % kw.TIME "NMTOKEN">
<!ENTITY % kw.N    "NMTOKEN">

<!ENTITY % a.daycounts "
          workdays %kw.N;    #IMPLIED
          holidays %kw.N;    #IMPLIED
          satsuns  %kw.N;    #IMPLIED
">

<!ELEMENT timelog (p*, period+) >
<!ATTLIST timelog  %a.daycounts;
          label    CDATA   #IMPLIED
          xmlns    CDATA   "http://example.org/mcxrx/timelog#">
<!ELEMENT period  (p*, (logentry* | period*)) >
<!ATTLIST period   %a.daycounts;
          label    CDATA   #IMPLIED>
<!ELEMENT logentry (#PCDATA) >
<!ATTLIST logentry
          date     %kw.DATE; #REQUIRED
          start    %kw.TIME; #REQUIRED
          end      %kw.TIME; #REQUIRED
          dur      %kw.N;    #REQUIRED
          project  NMTOKEN   #REQUIRED
          category NMTOKEN   #REQUIRED
>
<!ELEMENT p  (#PCDATA) >

For purposes of the discussion, we imagine (counterfactually) that this DTD is made available on the Web at the URI http://example.org/mcxrx/timelog.dtd.4

The schema is equally simple:

Figure 2: XML Schema document for time log data [File timelog.xsd]
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://example.org/mcxrx/timelog#"
  xmlns="http://example.org/mcxrx/timelog#"
  elementFormDefault="qualified" 
>
 <xsd:annotation>
  <xsd:documentation>
   A simple schema for timelog data: record time periods spent, by
   project and category.

   Revisions:
   2004-03-24: specify schema
   2002-01-04: made DTD to move existing data into XML
  </xsd:documentation>
 </xsd:annotation>

 <xsd:annotation>
  <xsd:documentation>
    To do:
    * allow paragraphs of prose annotation in TEI Lite or HTML.
      (this would require schema modules for paragraphs and phrase-level
      elements. Might be a good demo of schema modularization)
  </xsd:documentation>
 </xsd:annotation>

 <xsd:complexType name="container">
  <xsd:sequence>
   <xsd:element ref="p" minOccurs="0" maxOccurs="unbounded"/>
   <xsd:choice>
    <xsd:element ref="period" minOccurs="0" maxOccurs="unbounded"/>
    <xsd:element ref="logentry" minOccurs="0" maxOccurs="unbounded"/>
   </xsd:choice>
  </xsd:sequence>

  <xsd:attribute name="label" type="xsd:string"/>
  <xsd:attribute name="workdays" type="xsd:nonNegativeInteger"/>
  <xsd:attribute name="holidays" type="xsd:nonNegativeInteger"/>
  <xsd:attribute name="satsuns"  type="xsd:nonNegativeInteger"/>
 </xsd:complexType>

 <xsd:complexType name="toplevelContainer">
  <xsd:complexContent>
   <xsd:restriction base="container">
    <xsd:sequence>
     <xsd:element ref="p" minOccurs="0" maxOccurs="unbounded"/>
     <xsd:choice>
      <xsd:element ref="period" minOccurs="0" maxOccurs="unbounded"/>
      <xsd:element ref="logentry" minOccurs="0" maxOccurs="0"/>
     </xsd:choice>
    </xsd:sequence>
   </xsd:restriction>
  </xsd:complexContent>
 </xsd:complexType>

 <xsd:complexType name="logentry" mixed="true">
  <xsd:sequence/>
  <xsd:attribute name="date" type="xsd:date"/>
  <xsd:attribute name="start" type="xsd:token"/>
  <xsd:attribute name="end" type="xsd:token"/>
  <xsd:attribute name="dur" type="xsd:nonNegativeInteger"/>
  <xsd:attribute name="project" type="xsd:token"/>
  <xsd:attribute name="category" type="xsd:token"/>
 </xsd:complexType>

 <xsd:element name="timelog" type="toplevelContainer"/>
 <xsd:element name="period"  type="container"/>
 <xsd:element name="p">
  <xsd:complexType mixed="true"/>
 </xsd:element>
 <xsd:element name="logentry"  type="logentry"/>
</xsd:schema>

The meanings of these constructs can be paraphrased informally thus:
timelog

a quantity of timelog data; in practice, in this vocabulary this is usually a log for a single calendar month, and invariably a log for a single individual. Attributes include:
label

a human-readable label identifying the period in question; not guaranteed machine-parseable

workdays

the number of working days in the period; this is used for calculating the average hours per work week spent on given tasks. In some sense this is redundant: a sufficiently intelligent system could examine the date labels occurring and figure out how many of them are workdays. On the other hand, if a work day passes without any work being done, and thus without any entries in the time log, that work day nevertheless needs to be counted in the arithmetic. In practice, this attribute is often not specified, and inaccurate when specified.

holidays

the number of holidays in the period

satsuns

the number of weekend days in the period

period

an arbitrary subunit of a time log; in practice, this often groups together the log entries for a given week or day; periods can nest. Attributes are as for timelog; there is no conceptual difference between the two, the outermost period being identified as a timelog rather than a period solely for convenience in processing.

p

a paragraph of human-readable prose describing or commenting on the work done in a given period

logentry

a single entry in the time log, describing a period of time (hereinafter a chunk of time) dedicated to a single task (or, in degenerate cases, a period of time treated as a chunk despite being devoted to multiple tasks, or despite lacking any useful information about what the time actually went to). Legal attributes are:
date

the date on which the chunk occurred, in ISO 8601 (yyyy-mm-dd) form; each chunk occurs on exactly one date. Date boundaries are in local time.

start

the time at which the chunk started, in local time. In practice, this is normally the same as the end time of the preceding entry in the log; the default display styles for this vocabulary check for this condition and flag discontinuities visually, since they may indicate errors in the data. The value is not in ISO 8601 format because it lacks a seconds field; also, in practice the existing application uses a full stop rather than a colon as an hour-minutes separator.

end

the time at which the chunk ended, in local time. In practice, usually the same as the start time of the following entry in the log.

dur

the number of minutes in the chunk; normally this can be calculated by subtracting start from end, but in some cases this is not so. A time zone shift may be recorded thus:

<logentry 
      date="2004-01-10" 
      start="15.56" end="16.56" 
      dur="0" 
      project="other:" 
      category="other:">time 
zone shift</logentry>
(No machine-processable information is included about which time zone, exactly, is local time, and time zone shifts are not required by the semantics of the markup. They are included in order to reduce the number of places at which the sanity checking in the default style sheets suspects a possible error.)

project

the project (or notional account) to which this time chunk is allocated for purposes of time accounting (the vocabulary allows any value here; data entry software is used to ensure a reasonably consistent set of values); typical values are
xmlschema

work on the XML Schema Working Group

xsl

work on the XSL Working Group

xmlcg

work related to the XML Coordination Group

arch

work related to the W3C Architecture Domain

w3c

W3C work other than XML Schema, XML CG, or Architecture Domain

mep

work on the Model Editions Partnership

extreme

work on the conference Extreme Markup Languages

prof

professional work on projects other than those currently being identified more specifically

personal

the chunk went for personal, not work-related, activities

ovhd

work classed as ‘overhead’: not directly related to any particular project or account, but work-related nonetheless (e.g. cleaning spam filters, maintaining time logs, upgrading operating system)

other

unclassifiable

category

the kind of activity to which the chunk was devoted: telcon, making an agenda, attending face to face meeting, drafting or revising minutes, phone call with an individual, reading email, writing email, mix of reading and writing email, giving a talk, chatting on IRC, reading a paper, learning a piece of software, drafting or revising a paper, working on software or spec requirements, various other development activies, down time.

Sample data

Some sample data may help illustrate the usage of the vocabulary:

    <logentry date="2004-03-23" start="17.01" end="17.32" dur="31"
      project="xmlschema" category="agenda">revising ten-week
      plan</logentry>
    <logentry date="2004-03-23" start="17.32" end="17.53" dur="21"
      project="xmlschema" category="phone">NI, discuss
      ten-week plan and doc dates</logentry>
    <logentry date="2004-03-23" start="17.53" end="18.42" dur="49"
      project="xmlschema" category="agenda">revising ten-week plan,
      send to WG</logentry>
    <logentry date="2004-03-23" start="18.42" end="19.06" dur="24"
      project="personal" category="other:">ironing</logentry>
    <logentry date="2004-03-23" start="19.06" end="19.10" dur="4"
      project="overhead" category="email">Email sorting</logentry>
    <logentry date="2004-03-23" start="19.10" end="19.47" dur="37"
      project="personal" category="dogfeed">Feeding dogs</logentry>
    <logentry date="2004-03-23" start="19.47" end="19.59" dur="12"
      project="w3c" category="think">trying to find server to
      look at simile data</logentry>
    <logentry date="2004-03-23" start="19.59" end="20.10" dur="11"
      project="overhead" category="implementation">revising
      timelog.rexx to provide XSL category (and some other
      changes)</logentry>
    <logentry date="2004-03-23" start="20.10" end="20.59" dur="49"
      project="prof" category="other:">print out Lisbon papers for
      review</logentry>
    <logentry date="2004-03-23" start="20.59" end="21.33" dur="34"
      project="overhead" category="readmail"></logentry>
    <logentry date="2004-03-24" start=" 6.30" end=" 6.33" dur="3"
      project="overhead" category="other">startup time</logentry>
    <logentry date="2004-03-24" start=" 6.33" end="07.21" dur="48"
      project="w3c" category="docdraft">working on time log example
      for EM</logentry>
    <logentry date="2004-03-24" start="07.21" end=" 8.08" dur="47"
      project="personal" category="meal">breakfast</logentry>

A prose transcription captures the meaning of the marked up data fairly concisely; later, we'll try to formalize this better.

  • On 2004-03-23, from 5:01 to 5:32 p.m., NN5 spent 31 minutes revising ‘the ten-week plan’. This should be accounted for as time devoted to the project “xmlschema”. The category of activity is “agenda”.
  • On 2004-03-23, from 5:32 to 5:53 p.m., NN spent 21 minutes on the phone with NI, to discuss the ten-week plan and document publication dates. This should be accounted for as time devoted to the project “xmlschema”. The category of activity is “phone”.
  • On 2004-03-23, from 5:53 to 6:42 p.m., NN spent 49 minutes revising the ten-week plan and sending it to the WG. This should be accounted for as time devoted to the project “xmlschema”. The category of activity is “agenda”.
  • On 2004-03-23, from 18:42 to 19:06, NN spent 24 minutes ironing shirts. This should be accounted for as time devoted to the project “personal”. The category of activity is “other:”.
  • On 2004-03-23, from 19:06 to 19:10, NN spent 4 minutes sorting email. This should be accounted for as time devoted to the project “overhead”. The category of activity is “email”.
  • On 2004-03-23, from 19:10 to 19:47, NN spent 37 minutes Feeding dogs. This should be accounted for as time devoted to the project “personal”. The category of activity is “dogfeed”.
  • On 2004-03-23, from 19:47 to 19:59, NN spent 12 minutes trying to find the server to look at the sample Simile data. This should be accounted for as time devoted to the project “w3c”. The category of activity is “think”.
  • On 2004-03-23, from 19:59 to 20:10, NN spent 11 minutes revising timelog.rexx to provide an XSL category (and some other changes). This should be accounted for as time devoted to the project “overhead”. The category of activity is “implementation”.
  • On 2004-03-23, from 20:10 to 20:59, NN spent 49 minutes printing out papers to review for a conference. This should be accounted for as time devoted to the project “prof”. The category of activity is “other:”.
  • On 2004-03-23, from 20:59 to 21:33, NN spent 34 minutes reading email. This should be accounted for as time devoted to the project “overhead”. The category of activity is “readmail”.
  • On 2004-03-24, from 6:30 to 6:33, NN spent 3 minutes starting up his machine. This should be accounted for as time devoted to the project “overhead”. The category of activity is “other”.
  • On 2004-03-24, from 6:33 to 07:21, NN spent 48 minutes working on the time log example for EM. This should be accounted for as time devoted to the project “w3c”. The category of activity is “docdraft”.
  • On 2004-03-24, from 07:21 to 8:08, NN spent 47 minutes eating breakfast. This should be accounted for as time devoted to the project “personal”. The category of activity is “meal”.

It should be noted at the outset that some information in the paraphrases above is not explicit in the XML markup or content: two obvious examples are the fact that the individual whose time is being logged is NN and the fact that when NN spends time ironing, what he irons is shirts. To understand these facts from the marked up data, it is necessary and sufficient to understand some basic facts about the context in which the marked up data is designed to be created and used. A good inference engine could make the appropriate inferences, given the relevant additional facts.

The successful paraphrase of the activity descriptions from their sometimes telegraphic style into full English sentences also requires more knowledge than is explicit in the marked up data: it requires a command of English. This is not something any current software can be expected to have or acquire soon.

Notes on the vocabulary

The vocabulary presented is in private use by a single individual; the semi-controlled vocabulary used for projects and categories changes only slowly, and the vocabularies are controlled only by the data entry form, not by the DTD. Knowledge of the vocabularies is built into the processing software: for example, the fact that “work” minutes are the sum of schema minutes, w3c minutes, professional minutes, and overhead minutes, or that w3c minutes are the sum of the minutes spent on the “xmlcg”, “arch”, and “w3c” projects, and so on.

The vocabulary could easily be adapted for use by a group of people, whose log entries might be merged; in that case, (a) different methods of vocabulary control would be needed, and (b) the individual (or other entity) whose time is being logged would have to be made explicit in the XML. Also (c) it would make sense to have explicit records of the relations among various projects, and constraints (such as that the “meal” category is only allowed for the “personal” project).

The target syntax

A simple representation in logical form

Two kinds of object need to be distinguished in the logical form: chunks and time periods. For convenience in establishing inter-object links, we'll supply an arbitrary identifier (unique within a particular data stream) for each chunk or period.

A chunk is represented by a member of the chunk relation. One straightforward representation is

chunk(Date, Start_time, End_time, Duration, 
      Project, Category, Description)
in which each attribute of the logentry element is represented as an argument to the predicate, as is the #PCDATA content. Another representation adds a unique identifier for the chunk, for use in relating chunks to the time periods which contain them.
chunk(ID, Date, Start_time, End_time, Duration, 
      Project, Category, Description)
It seems a more plausible representation of the actual meaning of the log data, however, to treat the date, start, end, and dur attributes as identifying a specific time interval, with timestamps for the starting and ending points in time or with a starting timestamp and a duration:
chunk(ID, interval(Start, Duration), 
      Project, Category, Description)

A time period is represented by a member of the period relation:

period(ID,Label, Workdays, Holidays, Weekenddays)

The links between periods are represented by members of the contains relation:

contains(Outer_period_id,Inner_period_id)

The fact that a chunk occurs within a given period is represented by a member of the includes relation:

includes(Period_id,Chunk_id)

Note that other logical representations are possible. Some of the more obvious variations include:

  • A different ontology might be used, which postulates the existence of different kinds of objects.
  • The project and category to which a chunk of time is assigned might be represented not as strings of characters but as entities in their own right; in particular, they might be associated with well known (or other) URIs, as is usual in RDF.

Ways of modeling an n-tuple

Note that there are several ways of translating from the n-tuples above into the set of binary relations required by RDF.

If each log entry is viewed as a tuple of the form

chunk(Date,Start,End,Dur,Proj,Cat,Desc) 
and no identifier is assigned to the chunk (e.g. because our ontological scruples do not allow us to postulate the existence of chunks as individuals in the sense sometimes used in formal logic), then the problem of reducing the tuple to RDF resembles the problem of reducing an n-ary function to a unary function which takes one of the n arguments and returns a function of arity n-1 which accepts the other arguments and returns the desired result (or, more frequently, which is itself reduced to a unary function which takes one argument and returns a function of lesser arity). This is well understood under the name currying.6

Applying one currying-inspired approach to the tuple above, we would have, for some particular date, start-time, end-time, etc.:

       (dsedpcd_tuple(A)
        & sedpcd_tuple(B)
        & edpcd_tuple(C)
        & dpcd_tuple(D)
        & pcd_tuple(E)
        & cd_tuple(F)
        & d_tuple(G)
        & date(A,"2004-03-24") 
        & start(B,"07.21") 
        & end(C,"8.08") 
        & dur(D,"47") 
        & project(E,"personal") 
        & category(F,"meal")
        & desc(G,"breakfast")
        & r1(A,B)
        & r2(B,C)
        & r3(C,D)
        & r4(D,E)
        & r5(E,F)
        & r6(F,G))
Here, the first seven lines assert the types of various tuples and sub-tuples A, B, C, ... G, the next seven associate the various literal values given for the date, etc., with them, while the last six lines link the tuples up in an appropriate chain.

The approach just shown requires us to postulate several types of tuple, which may itself be ontologically troubling to us, so we may prefer a slightly different method, which makes use of a single tuple predicate:

          tuple(A,dsedpcd)
        & tuple(B,sedpcd)
        & tuple(C,edpcd)
        & tuple(D,dpcd)
        & tuple(E,pcd)
        & tuple(F,cd)
        & tuple(G,d)
        ...
the remainder of the translation is as before.

Since either of these translations could in theory take the arguments in any order, there are 5040 (7!) variations of each. In fact we could also group arguments into subtuples, so there are actually more ways to reduce the chunk predicate to a set of binary relations.

The translation into RDF is much simpler if we overcome whatever philosophical hesitation we might have about assuming the existence of time chunks; that allows us to represent the logentry elements in a more straightforward way in RDF (or using binary predicates):

(exists L)(logentry(L)
    & date(L,"2004-03-24") 
    & start(L,"07.21") 
    & end(L,"8.08") 
    & dur(L,"47") 
    & project(L,"personal") 
    & category(L,"meal")
    & desc(L,"breakfast"))
This is easier for humans to understand, and it is unlikely that any human designer would curry the predicate instead of postulating the existence of time chunks. But the choice does exist and must be made when specifying a mapping from colloquial XML into logical form or into RDF.

RDF

The simplest RDF translation of the sample closely resembles the form given above:

<Chunk rdf:ID="NN_2004-03-23T17.01/17.32">
      <who>NN</who>
      <date>2004-03-23</date>
      <start>17.01</start>
      <end>17.32</end>
      <dur>31</dur>
      <project>xmlschema</project>
      <category>agenda</category>
      <description>revising ten-week
      plan</description>
</Chunk>

A slightly more interesting translation assumes we may wish to associate further information with the projects and categories, and so assigns a URI to each of them.

Figure 3: [File sample.rdf]
<?xml version="1.0">
<!DOCTYPE rdf:RDF [
    <!ENTITY rdfns 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <!ENTITY timelogns 'http://example.org/mcxrx/timelog#'>
    <!ENTITY people 'http://example.org/mcxrx/people#'>
    <!ENTITY projects 'http://example.org/mcxrx/projects#'>
    <!ENTITY categories 'http://example.org/mcxrx/categories#'>
]>

<rdf:RDF
    xmlns:rdf="&rdfns;" 
    xmlns="&timelogns">

<Period rdf:ID="id2589187">
      <label>Sample</label>
</Period>

<Period rdf:ID="id2588575">
      <label>A short sample</label>
</Period>

<Period rdf:ID="id2591870">
      <label>Tue 23 Mar</label>
</Period>

<Chunk rdf:ID="NN_2004-03-23T17.01/17.32">
      <who rdf:resource="&people;NN" />
      <date>2004-03-23</date>
      <start>17.01</start>
      <end>17.32</end>
      <dur>31</dur>
      <project rdf:resource="&projects;xmlschema" />
      <category rdf:resouurce="&categories;agenda" />
      <description>revising ten-week
      plan</description>
</Chunk>

<Chunk rdf:ID="NN_2004-03-23T17.32/17.53">
     <who rdf:resource="&people;NN" />
      <date>2004-03-23</date>
      <start>17.32</start>
      <end>17.53</end>
      <dur>21</dur>
      <project rdf:resource="&projects;xmlschema" />
      <category rdf:resouurce="&categories;phone" />
      <description>M Holstege, discuss
      ten-week plan and SCD dates</description>
</Chunk>

<Chunk rdf:ID="NN_2004-03-23T17.53/18.42">
     <who rdf:resource="&people;NN" />
      <date>2004-03-23</date>
      <start>17.53</start>
      <end>18.42</end>
      <dur>49</dur>
      <project rdf:resource="&projects;xmlschema" />
      <category rdf:resouurce="&categories;agenda" />
      <description>revising ten-week plan,
      send to WG</description>
</Chunk>

<Chunk rdf:ID="NN_2004-03-23T18.42/19.06">
      <who rdf:resource="&people;NN" />
      <date>2004-03-23</date>
      <start>18.42</start>
      <end>19.06</end>
      <dur>24</dur>
      <project rdf:resource="&projects;personal" />
      <category rdf:resouurce="&categories;other:" />
      <description>ironing</description>
</Chunk>

<Chunk rdf:ID="NN_2004-03-23T19.06/19.10">
      <who rdf:resource="&people;NN" />
      <date>2004-03-23</date>
      <start>19.06</start>
      <end>19.10</end>
      <dur>4</dur>
      <project rdf:resource="&projects;overhead" />
      <category rdf:resouurce="&categories;email" />
      <description>Email sorting</description>
</Chunk>

<Chunk rdf:ID="NN_2004-03-23T19.10/19.47">
      <who rdf:resource="&people;NN" />
      <date>2004-03-23</date>
      <start>19.10</start>
      <end>19.47</end>
      <dur>37</dur>
      <project rdf:resource="&projects;personal" />
      <category rdf:resouurce="&categories;dogfeed" />
      <description>Feeding dogs</description>
</Chunk>

<Chunk rdf:ID="NN_2004-03-23T19.47/19.59">
      <who rdf:resource="&people;NN" />
      <date>2004-03-23</date>
      <start>19.47</start>
      <end>19.59</end>
      <dur>12</dur>
      <project rdf:resource="&projects;w3c" />
      <category rdf:resouurce="&categories;think" />
      <description>trying to find server to
      look at simile data</description>
</Chunk>

<Chunk rdf:ID="NN_2004-03-23T19.59/20.10">
      <who rdf:resource="&people;NN" />
      <date>2004-03-23</date>
      <start>19.59</start>
      <end>20.10</end>
      <dur>11</dur>
      <project rdf:resource="&projects;overhead" />
      <category rdf:resouurce="&categories;implementation" />
      <description>revising
      timelog.rexx to provide XSL category (and some other
      changes)</description>
</Chunk>

<Chunk rdf:ID="NN_2004-03-23T20.10/20.59">
      <who rdf:resource="&people;NN" />
      <date>2004-03-23</date>
      <start>20.10</start>
      <end>20.59</end>
      <dur>49</dur>
      <project rdf:resource="&projects;prof" />
      <category rdf:resouurce="&categories;other:" />
      <description>print out witt papers for
      review</description>
</Chunk>

<Chunk rdf:ID="NN_2004-03-23T20.59/21.33">
      <who rdf:resource="&people;NN" />
      <date>2004-03-23</date>
      <start>20.59</start>
      <end>21.33</end>
      <dur>34</dur>
      <project rdf:resource="&projects;overhead" />
      <category rdf:resouurce="&categories;readmail" />
</Chunk>

<Period rdf:ID='id259208'>
  <label>Wed 24 Mar"</label>
</Period>

<Chunk rdf:ID="NN_2004-03-24T6.30/6.33", 
     <who rdf:resource="&people;NN" />
      <date>2004-03-24</date>
      <start>6.30</start>
      <end>6.33</end>
      <dur>3</dur>
      <project rdf:resource="&projects;overhead" />
      <category rdf:resouurce="&categories;other" />
      <description>startup time</description>
</Chunk>

<Chunk rdf:ID="NN_2004-03-24T6.33/07.21">
      <who rdf:resource="&people;NN" />
      <date>2004-03-24</date>
      <start>6.33</start>
      <end>07.21</end>
      <dur>48</dur>
      <project rdf:resource="&projects;w3c" />
      <category rdf:resouurce="&categories;docdraft" />
      <description>working on time log example
      for EM</description>
</Chunk>

<Chunk rdf:ID="NN_2004-03-24T07.21/8.08">
      <who rdf:resource="&people;NN" />
      <date>2004-03-24</date>
      <start>07.21</start>
      <end>8.08</end>
      <dur>47</dur>
      <project rdf:resource="&projects;personal" />
      <category rdf:resouurce="&categories;meal" />
      <description>breakfast</description>
</Chunk>

</rdf:RDF>

The XSLT

We'll build the stylesheet up in phases, first doing just the simple things and gradually elaborating until we have a transformation which produces the full logical representation shown above; then we'll make a companions stylesheet to generate RDF.

Version 0.1: handling chunks

The stylesheet

We'll start by just extracting the instances of the chunk/7 relation chunk(Date, Start, End, Duration, Project, Category, Description).

The stylesheet framework is the usual one:

The beginning of the stylesheet will have pretty much exactly the same form in all the versions we describe in this paper:

Figure 5: Stylesheet DTD and start-tag
<?xml version='1.0'?>
<!DOCTYPE xsl:stylesheet PUBLIC 'http://www.w3.org/1999/XSL/Transform'
      '../../../People/cmsmcq/lib/xslt10.dtd' [
<!ENTITY nl "&#xA;">
]>
<xsl:stylesheet 
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>

<!--* A stylesheet which translates timelog data into logical form.
    * Revisions:
    * 2004-04-16 : v 0.6, finish for Extreme submission
    * 2004-03-24 : v 0.1, just do chunk/7 predicate.
    * 
    * To do:
    * v 0.1 chunk/7
    * v 0.2 chunk/8 + period
    * v 0.3 add contains_p_p/2, contains_p_c/2
    * v 0.4 add some implicit information, standard facts
    * v 0.5 add existential quantification
    * v 0.6 add some static inference rules
    *-->

This code is used in < “Stylesheet 0.1: chunks [File log.logic.01.xsl] ” 4 >< “Stylesheet 0.2: chunks with IDs [File log.logic.02.xsl] ” 12 >< “Stylesheet 0.6: logical form [File log.logic.06.xsl] ” 25 >

So will the end of the stylesheet:

Figure 6: Stylesheet end-tag
</xsl:stylesheet>
<!-- Keep this comment at the end of the file
Local variables:
mode: xml
sgml-default-dtd-file:"/SGML/Public/Emacs/xslt.ced"
sgml-omittag:t
sgml-shorttag:t
sgml-indent-data:t
sgml-indent-step:1
End:
-->

This code is used in < “Stylesheet 0.1: chunks [File log.logic.01.xsl] ” 4 >< “Stylesheet 0.2: chunks with IDs [File log.logic.02.xsl] ” 12 >< “Stylesheet 0.6: logical form [File log.logic.06.xsl] ” 25 >< “Stylesheet R: RDF export [File log.rdf.xsl] ” 27 >

The document root doesn't actually require any special handling, but in the interests of labeling the output, we emit a comment saying what the output actually is and where it came from:

Figure 7: Handling the document root
 <xsl:template match="/">
  <xsl:text>/* Logical representation of time log data &nl;</xsl:text>
  <xsl:text> * generated by timelog.to.logic.xsl &nl;</xsl:text>
  <xsl:text> */&nl;&nl;</xsl:text>
  <xsl:apply-templates/>
  <xsl:text>&nl;</xsl:text>
 </xsl:template>

This code is used in < “Stylesheet 0.1: chunks [File log.logic.01.xsl] ” 4 >< “Stylesheet 0.2: chunks with IDs [File log.logic.02.xsl] ” 12 >

Because we want to produce output in plain text format (suitable for loading into, say, a Prolog system), we need to tell the XSLT processor to produce text output rather than XML:

Figure 8: Output declaration
 <xsl:output method="text" media-type="text/plain"/>

This code is used in < “Stylesheet 0.1: chunks [File log.logic.01.xsl] ” 4 >< “Stylesheet 0.2: chunks with IDs [File log.logic.02.xsl] ” 12 >< “Stylesheet 0.6: logical form [File log.logic.06.xsl] ” 25 >

The heart of the translation is the handling of the log entry. For each logentry element, we want to generate one member of the chunk relation, with the appropriate arguments. Using the notation of XSLT's attribute value templates, we might say schematically that the string we want to generate is

chunk({@date}, {@start}, {@end}, {@dur}, 
   {@project}, {@category}, "{string(.)}")
When a given logentry element is the current element, the XPath expression @date denotes the (value of the) date attribute; in attribute value templates, the braces around XPath expressions indicate that the expressions are to be evaluated, rather than becoming a literal part of the string. The last argument (string(.)) denotes the string value of the element's content; note the quotation marks around it, which ensure that it is syntactically recognizable as a string.

In practice, if we want to load the output into a Prolog system, more quotation marks are useful: single quotes around the date, time, project, and category values will ensure that the values are read as single atoms even if they contain decimal points, colons, hyphens, etc.

chunk('{@date}', '{@start}', '{@end}', {@dur}, 
   '{@project}', '{@category}', '{string(.)}')
We do not quote the duration, since we want it read as an integer.

If we write this string into an XSLT template, however, we won't get quite the results we wish: while the attribute value template notation is convenient, it is used, as the name implies, only for certain attribute values. To generate the appropriate strings, we will need to use a slightly more cumbersome notation:

Figure 9: Handling a single log entry
<xsl:template match="logentry">
chunk('<xsl:value-of select="@date"/>', 
      '<xsl:value-of select="@start"/>', 
      '<xsl:value-of select="@end"/>', 
      <xsl:value-of select="@dur"/>, 
      '<xsl:value-of select="@project"/>', 
      '<xsl:value-of select="@category"/>', 
      "<xsl:value-of select="string(.)"/>").
</xsl:template>

This code is used in < “Stylesheet 0.1: chunks [File log.logic.01.xsl] ” 4 >

Variations on this pattern are possible, in order to vary the layout of the output and of the XSLT stylesheet.

Once we have this template for logentry elements, we have most of what we need. The default rules for XSLT will process other elements by recurring on their children, which means we will effectively ignore all timelog and period elements and find all logentry elements at whatever depth.

On the other hand, the default rule for text nodes is to write them out to the output. We don't want the content of the p element in the output. So we suppress the processing of the p element:

The white space in the source document, however, still gets copied to the output; this is not a serious problem, but it is unsightly. So we suppress all text nodes in the input document:

Figure 11: [continues 10 Suppressing character data ]
 <xsl:template match="text()"/>

The output

The stylesheet given above produces the following output (reformatted for compactness):

chunk('2004-03-23', '17.01', '17.32', 31, 
      'xmlschema', 'agenda', "revising ten-week plan").
chunk('2004-03-23', '17.32', '17.53', 21, 
      'xmlschema', 'phone', "NI, discuss ten-week plan and doc dates").
chunk('2004-03-23', '17.53', '18.42', 49, 
      'xmlschema', 'agenda', "revising ten-week plan, send to WG").
chunk('2004-03-23', '18.42', '19.06', 24, 
      'personal', 'other:', "ironing").
chunk('2004-03-23', '19.06', '19.10', 4, 
      'overhead', 'email', "Email sorting").
chunk('2004-03-23', '19.10', '19.47', 37, 
      'personal', 'dogfeed', "Feeding dogs").
chunk('2004-03-23', '19.47', '19.59', 12, 
      'w3c', 'think', "trying to find server to look at simile data").
chunk('2004-03-23', '19.59', '20.10', 11, 
      'overhead', 'implementation', 
      "revising timelog.rexx to provide XSL category (and some other changes)").
chunk('2004-03-23', '20.10', '20.59', 49, 
      'prof', 'other:', "print out Lisbon papers for review").
chunk('2004-03-23', '20.59', '21.33', 34, 
      'overhead', 'readmail', "").
chunk('2004-03-24', '6.30', '6.33', 3, 
      'overhead', 'other', "startup time").
chunk('2004-03-24', '6.33', '07.21', 48, 
      'w3c', 'docdraft', "working on time log example for EM").
chunk('2004-03-24', '07.21', '8.08', 47, 
      'personal', 'meal', "breakfast").

Version 0.2: giving chunks unique identifiers

The second version of the XSLT transformation will provide a unique identifier for each chunk. It doesn't much matter what method we use of generating the unique identifier, as long as it's unique for the chunk.

We'll also generate instances of the period relation, again with unique identifiers.

The stylesheet

Adding unique identifiers

The stylesheet framework is almost the same as before:

The only difference is in handling the log entries. One simple way to generate a unique identifier for each chunk is to use the XSLT generate-id function. This is guaranteed to generate a distinct identifier for each element in the input document. And since each chunk corresponds to a distinct element in the input, it will give us an identifier for each chunk which is unique in the data stream:

Figure 13: Handling a single log entry (version 2a)
<xsl:template match="logentry">
chunk('<xsl:value-of select="generate-id()"/>', 
      '<xsl:value-of select="@date"/>', 
      '<xsl:value-of select="@start"/>', 
      '<xsl:value-of select="@end"/>', 
      <xsl:value-of select="@dur"/>, 
      '<xsl:value-of select="@project"/>', 
      '<xsl:value-of select="@category"/>', 
      "<xsl:value-of select='string(.)'/>").
</xsl:template>

This code is not used elsewhere.

The identifiers generated by generate-id for chunks contained in different XML files, however, are not guaranteed distinct. They also won't be guaranteed unique if we ever wish to merge log data for multiple individuals. For any individual, however, we know that there should be only one chunk with the same date and the same start and end times; the individual can be in only one place at a time, and billing the same time chunk to two different accounts violates the principles of time accounting.7 So we can construct an identifier for the chunk that way. And while we're worrying about possible merger with other data later, we'll add an argument identifying the individual.

Figure 14: Handling a single log entry (version 2b)
<xsl:template match="logentry">
chunk('<xsl:value-of select="concat('NN_',@date,'T',
           normalize-space(@start),'/',
           normalize-space(@end))"/>', 
      'NN',
      '<xsl:value-of select="@date"/>', 
      '<xsl:value-of select="@start"/>', 
      '<xsl:value-of select="@end"/>', 
      <xsl:value-of select="@dur"/>, 
      '<xsl:value-of select="@project"/>', 
      '<xsl:value-of select="@category"/>', 
      "<xsl:value-of select='string(.)'/>").
</xsl:template>

This code is used in < “Stylesheet 0.2: chunks with IDs [File log.logic.02.xsl] ” 12 >< “Stylesheet 0.6: logical form [File log.logic.06.xsl] ” 25 >

Period data

Generating a clause for each period in the input is straightforward. We just add a template to handle the timelog and period elements. The only complication is that since in practice the workdays and other attributes are often supplied manually, they are not always present, and when present not always accurate. Since our task is to interpret the markup, not to clean up the meaning, we do nothing here about inaccurate values. But if the value is not supplied, we either need to invent yet another relation with a different arity, or we need to allow a special value to which we assign the meaning ‘unknown’.

Figure 15: Handling a period
<xsl:template match="period|timelog">
period(<xsl:value-of select="generate-id()"/>,
      '<xsl:value-of select="@label"/>', 
      <xsl:choose>
        <xsl:when test="@workdays">
          <xsl:value-of select="@workdays"/>
        </xsl:when>
        <xsl:otherwise>'unknown'</xsl:otherwise>
      </xsl:choose>, 
      <xsl:choose>
        <xsl:when test="@holidays">
          <xsl:value-of select="@holidays"/>
        </xsl:when>
        <xsl:otherwise>'unknown'</xsl:otherwise>
      </xsl:choose>, 
      <xsl:choose>
        <xsl:when test="@satsuns">
          <xsl:value-of select="@satsuns"/>
        </xsl:when>
        <xsl:otherwise>'unknown'</xsl:otherwise>
      </xsl:choose>).
<xsl:apply-templates/>
</xsl:template>

This code is used in < “Stylesheet 0.2: chunks with IDs [File log.logic.02.xsl] ” 12 >< “Stylesheet 0.6: logical form [File log.logic.06.xsl] ” 25 >

Sample output data

Version 0.2 of stylesheet produces the following output for the short sample (reformatted for compactness):

/* Logical representation of time log data 
 * generated by timelog.to.logic.xsl 
 */


period(id2589187, 'Sample', 
       'unknown', 'unknown', 'unknown').
period(id2588575, 'A short sample', 
       'unknown', 'unknown', 'unknown').
period(id2591870, 'Tue 23 Mar', 
       'unknown', 'unknown', 'unknown').

chunk('NN_2004-03-23T17.01/17.32', 
      'NN', '2004-03-23', '17.01', '17.32', 31, 
      'xmlschema', 'agenda', "revising ten-week plan").
chunk('NN_2004-03-23T17.32/17.53', 
      'NN', '2004-03-23', '17.32', '17.53', 21, 
      'xmlschema', 'phone', 
      "NI, discuss ten-week plan and doc dates").
chunk('NN_2004-03-23T17.53/18.42', 
      'NN', '2004-03-23', '17.53', '18.42', 49, 
      'xmlschema', 'agenda', 
      "revising ten-week plan, send to WG").
chunk('NN_2004-03-23T18.42/19.06', 
      'NN', '2004-03-23', '18.42', '19.06', 24, 
      'personal', 'other:', "ironing").
chunk('NN_2004-03-23T19.06/19.10', 
      'NN', '2004-03-23', '19.06', '19.10', 4, 
      'overhead', 'email', "Email sorting").
... etc. 

Version 0.6: containment, existential quantification, inference rules

The final version of the log-entry-to-logic transformation extends the foregoing in a few simple ways.

Capturing containment relations

First, we add rules to capture the containment relations between time periods and between time periods and the log entries in them. Since every period element is the child either of a timelog element or of another period element, and since we are using the standard function generate-id to make unique identifiers for the periods, all we need to do is to write out a contains clause with the IDs generated for the parent and for the current element.

Figure 16: Capturing containment relations
 <xsl:template match="period" mode="containment">
contains(<xsl:value-of select="generate-id(..)"/>,
         <xsl:value-of select="generate-id()"/>).
  <xsl:apply-templates mode="containment"/>
 </xsl:template>

Continued in 17, 18

This code is used in < “Stylesheet 0.6: logical form [File log.logic.06.xsl] ” 25 >

Log entries are very similar. Each logentry element is contained in a period, and we must write a clause for the includes relation with the ID of the enclosing period and the ID we generate for the log entry itself. The only complication comes from the more complex form of identifier we have chosen for log entries.

Figure 17: [continues 16 Capturing containment relations ]
 <xsl:template match="logentry" mode="containment">
includes(<xsl:value-of select="generate-id(..)"/>,
         '<xsl:value-of select="concat('NN_',@date,'T',
           normalize-space(@start),'/',
           normalize-space(@end))"/>').
 </xsl:template>

And because we are generating containment clauses in a separate mode, we need to suppress the p element and text nodes again.

Figure 18: [continues 16 Capturing containment relations ]
 <xsl:template match="p" mode="containment"/>
 <xsl:template match="text()" mode="containment"/>

The templates just added produce the following clauses for our standard sample:

contains(id2588993,
         id2590277).
  
contains(id2590277,
         id2590286).
  
includes(id2590286,
         'NN_2004-03-23T17:01/17:32').
 
includes(id2590286,
         'NN_2004-03-23T17:32/17:53').
 
includes(id2590286,
         'NN_2004-03-23T17:53/18:42').
 
includes(id2590286,
         'NN_2004-03-23T18:42/19:06').
 
includes(id2590286,
         'NN_2004-03-23T19:06/19:10').
 
includes(id2590286,
         'NN_2004-03-23T19:10/19:47').
 
includes(id2590286,
         'NN_2004-03-23T19:47/19:59').
 
includes(id2590286,
         'NN_2004-03-23T19:59/20:10').
 
includes(id2590286,
         'NN_2004-03-23T20:10/20:59').
 
includes(id2590286,
         'NN_2004-03-23T20:59/21:33').
 
contains(id2590277,
         id2592222).
  
includes(id2592222,
         'NN_2004-03-24T6:30/6:33').
 
includes(id2592222,
         'NN_2004-03-24T6:33/07:21').
 
includes(id2592222,
         'NN_2004-03-24T07:21/8:08').

Existential quantification

Next, we add some rules to generate explicit statements that about the existence of certain things. If a log entry says a certain chunk of time was spent on a particular project P, doing work of a particular category C, then unless the markup is faulty we can infer that there is a project P and a category C.8 This seems to be worth capturing; we can do this with a simple rule that fires for each log entry:

Figure 19: Existential quantification of time chunks, projects, and categories
 <xsl:template match="logentry" mode="existence">
time_chunk('<xsl:value-of select="concat('NN_',@date,'T',
           normalize-space(@start),'/',
           normalize-space(@end))"/>').
project(<xsl:value-of select="@project"/>).
category(<xsl:value-of select="@category"/>).
 </xsl:template>

This code is not used elsewhere.

This generates statements of the form:

time_chunk('NN_2004-03-23T20:59/21:33').
project(overhead).
category(readmail).
which assign categories or types (in the informal sense, not in that of any specific schema language) to the individuals named.

Some might prefer to generate explicit statements that the things named actually exist, and wish to see sentences more like the following:

exists('NN_2004-03-23T20:59/21:33').
time_chunk('NN_2004-03-23T20:59/21:33').
exists(overhead).
project(overhead).
exists(readmail).
category(readmail).
This can be done, if desired, by writing the template as follows.

Figure 20: Existential quantification of projects and categories
 <xsl:template match="logentry" mode="existence">
exists('<xsl:value-of select="concat('NN_',@date,'T',
           normalize-space(@start),'/',
           normalize-space(@end))"/>').
time_chunk('<xsl:value-of select="concat('NN_',@date,'T',
           normalize-space(@start),'/',
           normalize-space(@end))"/>').
exists(<xsl:value-of select="@project"/>).
project(<xsl:value-of select="@project"/>).
exists(<xsl:value-of select="@category"/>).
category(<xsl:value-of select="@category"/>).
 </xsl:template>

This code is not used elsewhere.

But the reader is reminded that in the more or less unanimous opinion of logicians since Kant, existence is not a predicate.

A similar template generates simple type predicates for the time periods:

Figure 21: [continues 23 Existential quantification of time chunks, projects, and categories ]
 <xsl:template match="timelog|period" mode="existence">
time_period(<xsl:value-of select="generate-id()"/>).
  <xsl:apply-templates mode="existence"/>
 </xsl:template>

On a practical note, the templates just added generate a large number of redundant statements: after the first time we are told project(xmlschema) there isn't that much point to seeing it again and again. If we wish, we can suppress the duplicates using any of the various techniques well known in the XSLT community for grouping. One way is to use keys. First, we define project and category as keys:

Figure 22: Keys for project and category codes
 <xsl:key name="projectcodes" match="logentry" use="@project"/>
 <xsl:key name="categories" match="logentry" use="@category"/>

This code is used in < “Stylesheet 0.6: logical form [File log.logic.06.xsl] ” 25 >< “Stylesheet R: RDF export [File log.rdf.xsl] ” 27 >

Then we modify the template for logentry shown above, to emit the project or category clause only for the first occurrence of the key in the document:

Figure 23: Existential quantification of time chunks, projects, and categories
 <xsl:template match="logentry" mode="existence">
time_chunk('<xsl:value-of select="concat('NN_',@date,'T',
           normalize-space(@start),'/',
           normalize-space(@end))"/>').
  <xsl:if test="generate-id() = generate-id(key('projectcodes',@project)[1])">
project(<xsl:value-of select="@project"/>). 
  </xsl:if>
  <xsl:if test="generate-id() = generate-id(key('categories',@category)[1])">
category(<xsl:value-of select="@category"/>).
  </xsl:if>
 </xsl:template>

Continued in 21

This code is used in < “Stylesheet 0.6: logical form [File log.logic.06.xsl] ” 25 >

Static inference rules

Nothing we have done so far has done anything to capture some information intrinsic to this vocabulary. We know, for example, that if a time period A contains another time period B, and B contains C, then A contains C: containment is transitive. And if period C includes some log entries, then they are included as well (indirectly) in any time period containing C.

On another more vocabulary-specific note: the projects can be grouped together in various ways: xmlschema and xsl are working-group related, they together with various others are W3C-related, some projects are work-related and others not, and so forth. In practice, at present this information is hard-coded into the processing software for this vocabulary; if the vocabulary were to be used in a larger context it would be important to provide standard ways to get at (and alter) these interrelationships.

Our final version of the stylesheet for logical form will do just that: it will include, as part of the logical translation, some simple inference rules which capture some of the important relations in the design. Here, for simplicity's sake we will use Prolog notation for the inference rules.

Figure 24: Static inference rules
 <xsl:template name="rules">
contains_star(P1,P2) :- contains(P1,P2).
contains_star(P1,P2) :- contains(P1,P3), contains(P3,P2).
includes_star(P1,L) :- includes(P1,L).
includes_star(P1,L) :- contains(P1,P2), includes(P2,L).

work_related(Project) :- w3c_related(Project).
work_related(Project) :- prof_related(Project).
work_related(ovhd).

w3c_related(Project) :- wg_related(Project).
w3c_related(xmlcg).
w3c_related(arch).
w3c_related(w3c).

prof_related(mep).
prof_related(extreme).
prof_related(prof).
 </xsl:template>

This code is used in < “Stylesheet 0.6: logical form [File log.logic.06.xsl] ” 25 >

The rules here are fairly simple, but worth making explicit in a simple notation like this one; embedding them solely in application logic is a good way to make them hard to check and hard to reuse without accidental modification.

Overall structure

The overall structure of the stylesheet is much as before, with added pointers to the new bits:

The root element now makes several calls to apply-templates, one for existential quantification, one for the basic processing covered in earlier versions, and one for containment relations:

Figure 26: Handling the document root
 <xsl:template match="/">
  <xsl:text>/* Logical representation of time log data &nl;</xsl:text>
  <xsl:text> * generated by timelog.to.logic.xsl &nl;</xsl:text>
  <xsl:text> */&nl;&nl;</xsl:text>
  <xsl:apply-templates mode="existence"/>
  <xsl:text>&nl;</xsl:text>
  <xsl:apply-templates/>
  <xsl:text>&nl;</xsl:text>
  <xsl:apply-templates mode="containment"/>
  <xsl:text>&nl;</xsl:text>
 </xsl:template>

This code is used in < “Stylesheet 0.6: logical form [File log.logic.06.xsl] ” 25 >

XSLT to generate RDF

Generating an RDF equivalent of the material shown above in logical form is not difficult. In the interests of simplifying comparison, we will retain the same structure and division into code fragments that we had above. The overall structure of the stylesheet is simple:

The beginning of the stylesheet is just as before, except for the comment.

Figure 28: Stylesheet DTD and start-tag
<?xml version='1.0'?>
<!DOCTYPE xsl:stylesheet PUBLIC 'http://www.w3.org/1999/XSL/Transform'
      '../../../People/cmsmcq/lib/xslt10.dtd' [
<!ENTITY nl "&#xA;">
<!ENTITY rdfns 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<!ENTITY timelogns 'http://example.org/mcxrx/timelog#'>
<!ENTITY people 'http://example.org/mcxrx/people#'>
<!ENTITY projects 'http://example.org/mcxrx/projects#'>
<!ENTITY categories 'http://example.org/mcxrx/categories#'>
]>
<xsl:stylesheet 
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://example.org/mcxrx/timelog#"
>

<!--* A stylesheet which translates timelog data into RDF.
    * Revisions:
    * 2004-04-16 : make file
*-->

This code is used in < “Stylesheet R: RDF export [File log.rdf.xsl] ” 27 >

The output declaration shifts from text/plain to text/xml.

Figure 29: Output declaration
 <xsl:output method="xml" media-type="text/xml"/>

This code is used in < “Stylesheet R: RDF export [File log.rdf.xsl] ” 27 >

The keys for project and category codes are retained, although it's not clear whether we use them or not. The template for the document root is also retained without change.

The template which matches the root must generate a root element of rdf:RDF; otherwise this is the same as before.

Figure 30: Generating the root RDF element
 <xsl:template match="/">
  <xsl:comment>* RDF representation of time log data *</xsl:comment>
  <xsl:comment>* Generated by log.rdf.xsl            *</xsl:comment>
  <xsl:text>&nl;&nl;</xsl:text>
  <rdf:RDF>
   <xsl:apply-templates mode="existence"/>
   <xsl:text>&nl;</xsl:text>
   <xsl:apply-templates/>
   <xsl:text>&nl;</xsl:text>
   <xsl:apply-templates mode="containment"/>
   <xsl:text>&nl;</xsl:text>
  </rdf:RDF>
 </xsl:template>

This code is used in < “Stylesheet R: RDF export [File log.rdf.xsl] ” 27 >

The RDF fragment shown above puts the properties of the time chunk into subelements. That can be done with this template.

Figure 31: Handling the log entries
<xsl:template match="logentry">
<Chunk rdf:ID="{concat('NN_',@date,'T',
           normalize-space(@start),'/',
           normalize-space(@end))}"> 
      <who rdf:resource='&people;NN'/>
      <date><xsl:value-of select="@date"/></date> 
      <start><xsl:value-of select="@start"/></start> 
      <end><xsl:value-of select="@end"/></end> 
      <dur><xsl:value-of select="@dur"/></dur>
      <project rdf:resource="&projects;{@project}"/>
      <category rdf:resource="&categories;{@category}"/>
      <description><xsl:value-of select='string(.)'/></description>
</Chunk>
</xsl:template>

This code is used in < “Stylesheet R: RDF export [File log.rdf.xsl] ” 27 >

Handling the period elements is straightforward. If the attributes workdays etc. are not specified, we do nothing special for them (so this rule is simpler than that for the logical form).

Figure 32: Handling the period elements
<xsl:template match="period|timelog">
  <Period rdf:ID="{generate-id()}">
      <label><xsl:value-of select="@label"/></label>
      <xsl:if test="@workdays">
        <workdays><xsl:value-of select="@workdays"/></workdays>
      </xsl:if>
      <xsl:if test="@holidays">
        <holidays><xsl:value-of select="@holidays"/></holidays>
      </xsl:if>
      <xsl:if test="@satsuns">
        <satsuns><xsl:value-of select="@satsuns"/></satsuns>
      </xsl:if>
  </Period>
  <xsl:apply-templates/>
</xsl:template>

This code is used in < “Stylesheet R: RDF export [File log.rdf.xsl] ” 27 >

Containment information could be specified in the main template for periods, but to keep the RDF stylesheet closely parallel to the other, we'll encode it separately.

Figure 33: Capturing containment information
 <xsl:template match="period" mode="containment">
  <Period rdf:ID="{generate-id(..)}">
    <contains rdf:resource="#{generate-id()}"/>
  </Period>
  <xsl:apply-templates mode="containment"/>
 </xsl:template>

 <xsl:template match="logentry" mode="containment">
  <Period rdf:ID="{generate-id(..)}">
    <includes rdf:resource="#{concat('NN_',@date,'T',
           normalize-space(@start),'/',
           normalize-space(@end))}"/>
  </Period>
 </xsl:template>
 <xsl:template match="p" mode="containment"/>
 <xsl:template match="text()" mode="containment"/>

This code is used in < “Stylesheet R: RDF export [File log.rdf.xsl] ” 27 >

The unary predicates we wrote to specify the types of various individuals in the logical system have their analogue in the rdf:type property. We don't need to write anything in the stylesheet for periods and time chunks, however, since

<Chunk rdf:ID="NN_2004-03-23T17.01/17.32">
  ...
</Chunk>
is interpreted as an abbreviation of
<rdf:Description rdf:ID="NN_2004-03-23T17.01/17.32">
  <rdf:type rdf:resource="http://example.org/mcxrx/timelog#Chunk"/>
  ...
</Chunk>
in which rdf:type is already specified. If, however, we wish to say something about the projects and categories, we can do so, by assigning an RDF type to the resources we name as projects and categories.

Figure 34: Existential quantification of time chunks, projects, and categories
 <xsl:template match="logentry" mode="existence">
  <xsl:if test="generate-id() = generate-id(key('projectcodes',@project)[1])">
    <rdf:Description rdf:resource="&projects;{@project}">
      <rdf:type rdf:resource="&timelogns;project"/>
    </rdf:Description>
  </xsl:if>
  <xsl:if test="generate-id() = generate-id(key('categories',@category)[1])">
    <rdf:Description rdf:resource="&categories;{@category}">
      <rdf:type rdf:resource="&timelogns;category"/>
    </rdf:Description>
  </xsl:if>
 </xsl:template>
 <xsl:template match="p" mode="existence"/>
 <xsl:template match="text()" mode="existence"/>

This code is used in < “Stylesheet R: RDF export [File log.rdf.xsl] ” 27 >

Strictly speaking, this says of the resources named http://example.org/mcxrx/projects#xmlschema and so on that they are of type http://example.org/mcxrx/timelog#project, and similarly of the resources named http://example.org/mcxrx/categories#agenda and so on that they are of type http://example.org/mcxrx/timelog#category. This may not be perfect, but it does seem like a reasonable approximation of the logical form given earlier, over which it has the advantage of avoiding unintentional conflict with other vocabularies of projects and categories, as well as avoiding conflict with other vocabularies of types. If the URIs used can be dereferenced, the RDF form also provides a convenient link to a data dictionary (and encourages the system implementor to provide such a dictionary).

Suppressing character data works the same way here as in the earlier stylesheets.

Figure 35: Suppressing character data
 <xsl:template match="p"/>
 <xsl:template match="text()"/>

This code is used in < “Stylesheet R: RDF export [File log.rdf.xsl] ” 27 >

Attaching annotations to schemas

In the previous section, we have illustrated how XSLT can be used to specify the mapping from a simple colloquial XML vocabulary into logical form and RDF, and thus to document the vocabulary, at least for those able to understand both the XSLT and the RDF it generates. To use such mappings in practice, of course, it's necessary to know what mappings exist and know how to find them. Although they may have little theoretical interest, therefore, mechanisms for associating the mapping (in the form of the XSLT stylesheet) with the source vocabulary are of great practical importance.

Several methods have been proposed for associating arbitrary information with the definitions of XML vocabularies; none has achieved usage quite as wide as its more enthusiastic proponents would have wished. Among those which come most readily to mind are:

  • the xsd:annotation element defined by the XML Schema specification
  • the use of processing instructions in DTDs described by [Ramalho et al. 1999]
  • the Resource Directory Description Language (RDDL) described by [Borden/Bray 2002]
  • the Schema Adjunct Framework described by [Vorthmann/Buck 2000a], [Vorthmann/Buck 2000b], and [Vorthmann/Robie 2001]
  • the use of XLink to assert the association
  • the use of RDF to assert the association
  • the use of a processing instruction with a specified target and appropriate pseudo-attributes in the XML document instance (by analogy with the xml-stylesheet processing instruction)
  • the use of a specific link type within a document (this could use XLink syntax or be sui generis like the HTML link element)
Any of these methods can be used to associate mappings like those shown above with a vocabulary or with an instance of a vocabulary. Most of them can be adapted in obvious ways for asserting the association
  • within an instance of the vocabulary
  • within the definition (or within a definition) of the vocabulary
  • outside of either (like an external XLink or other standoff annotation)

One may hesitate for a while over the correct way to describe these mappings in any formal annotation. The mappings describe a relation between constructs in some source vocabulary and constructs in some target vocabulary, but what identifiers should be used to denote the vocabularies?

The target vocabulary is fairly simple: there is no well known or standard URI denoting either first-order predicate calculus or Prolog, and so there is really no choice but to make one up and document it, which we do in the following sentence. In the examples which follow, we use the URI http://example.org/mcxrx/transforms#prolog denote the notation of the Prolog programming language.

The source vocabulary poses a tricky question. One possibility is to use the URI of the schema document defining the vocabulary (for our example, both http://example.org/mcxrx/timelog.dtd and http://example.org/mcxrx/timelog.xsd); there may be other schema documents, as well. Under normal circumstances, however, a vocabulary with several schema documents is intended to be, held to be, and treated as a single vocabulary with a single intended meaning.9 The mappings are constructed so as to preserve that meaning, and hence they ought to apply regardless of the mechanisms used for validation. The mappings document the vocabulary, not the schema.10

A second possibility would be to use the namespace name (for our example, http://example.org/mcxrx/timelog#), with the intent of saying more explicitly that the mapping goes from (the vocabulary associated with) the namespace to the target syntax. The only problem is that it seems quite likely that typical mappings will produce acceptable results only for syntactically correct source documents. The mappings above rely on logentry elements to have values for the required attributes, and they require period and logentry elements alike to occur as children of elements denoting time periods. A source document which placed period elements inside of log entries, rather than the other way around, would not be translated successfully to either target syntax. This is an instance of the general rule that the role of semantics is to provide interpretations for well-formed utterances in a language, not for all strings of symbols in a given alphabet. That would seem to suggest it's not so absurd as might at first appear to identify the source language of the mapping with the schema document rather than with the namespace.

Multi-namespace vocabularies (for mixed-namespace documents) pose another challenge; in some cases, the points at which namespace crossings occur are carefully designed and have special semantics (consider, for example, the namespace crossings in a typical XSLT stylesheet); it seems likely that a single mapping document may be the right way to capture the semantics of such a mixed vocabulary and it may be easiest and clearest to associate that mapping with the schema document defining the mixed namespace rather than with any one (or with all) of the namespaces in the mix. In other cases, the namespace crossings are equally carefully designed but specified in such a way as to allow mixed-namespace documents to be interpreted by simply composing the individual mappings for the individual namespaces. In these cases, it may be easiest and clearest to associate the mappings with the namespace names.

In practice, we may expect both of these approaches to be used with more or less success; both are defensible and neither is obviously the only right solution. Even if one believes that one approach is distinctly preferable, the other approach may be regarded without too much trouble as an instance of metonymy: taking the schema document as a proxy for the namespace which it (partially) defines, or vice versa.

In the following sections, we illustrate the use of several methods for associating an XSLT mapping with a vocabulary: first RDDL, then free-standing RDF, and finally the xsd:annotation element (in several variants). In each example, we identify the namespace, rather thant the schema, as the source syntax for the mapping.

Using RDDL to assert the association

When a particular vocabulary is defined as a namespace and the owner of the namespace makes a RDDL document dereferenceable at the URI which serves as the namespace name, the rddl:resource element may be used to associate one or more mapping files with the vocabulary, in the same way that RDDL allows one to associate documentation, schemas and document type definitions in various forms, stylesheets, and processing software with the vocabulary.

The namespace document for our timelog vocabulary might read in part:

<p>Resources relevant to this vocabulary include:</p>
<ul>
  <li><rddl:resource xlink:href="http://example.org/mcxrx/timelog#"
        xlink:role="http://www.tei-c.org/P4X/"
        xlink:arcrole="reference"
        xlink:title="documentation">
      human-readable documentation</rddl:resource></li>
  <li><rddl:resource xlink:href="http://example.org/mcxrx/timelog.dtd"
        xlink:role="http://www.isi.edu/in-notes/iana/assignments/
        media-types/application/xml-dtd"
        xlink:arcrole="http://www.rddl.org/purposes/#validation"
        xlink:title="DTD">
      XML 1.0 DTD in bracket-bang notation</rddl:resource></li>
   <li><rddl:resource xlink:href="http://example.org/mcxrx/log.logic.xsl"
        xlink:role="http://www.w3.org/1999/XSL/Transform"
        xlink:arcrole="http://example.org/mcxrx/purposes#logic-mapping"
        xlink:title="logical form">
      XSLT stylesheet showing mapping into logical form</rddl:resource></li>
  <li><rddl:resource xlink:href="http://example.org/mcxrx/log.rdf.xsl"
        xlink:role="http://www.w3.org/1999/XSL/Transform"
        xlink:arcrole="http://example.org/mcxrx/purposes#RDF-mapping"
        xlink:title="RDF mapping">
      XSLT stylesheet showing mapping into RDF</rddl:resource></li>
</ul>

Using RDF to assert the association

The RDF data model provides a simple yet powerful, flexible assertion language for representing this kind of association. An example of this using an example 'ex' namespace which would declare the necessary terms used in this description might look like the following:

<ex:NameSpace rdf:about="http://example.org/mcxrx/timelog#">
  <dc:title>Simple timelog namespace</dc:title>
</ex:NameSpace>

<ex:XSLTTransform rdf:resource = "http://example.org/mcxrx/log.logic.xsl">
  <dc:title>Simple XSLT transformation of timelog data into logic</dc:title>
</ex:XSLTTransform>

<ex:XSLTTransform rdf:resource = "http://example.org/mcxrx/log.rdf.xsl">
  <dc:title>Simple XSLT transformation of timelog data into RDF</dc:title>
</ex:XSLTTransform>

<rdf:Desciption>
  <dc:title>Logic Mapping of simple timelog namespace to Prolog</dc:title>
  <ex:source rdf:resource = "http://example.org/mcxrx/timelog#" />
  <ex:target rdf:resource = "http://example.org/mcxrx/transforms#prolog" />
  <ex:transform rdf:resource = "http://example.org/mcxrx/log.logic.xsl" />
</rdf:Description>

<rdf:Desciption>
  <dc:title>Logic Mapping of simple timelog namespace to RDF</dc:title>
  <ex:source rdf:resource = "http://example.org/mcxrx/timelog#" />
  <ex:target rdf:resource = "http://example.org/mcxrx/transforms#RDF" />
  <ex:transform rdf:resource = "http://example.org/mcxrx/log.rdf.xsl" />
</rdf:Description>

Using xsd:annotation to assert the association

The xsd:annotation element can occur both at the top level of a schema document, and within virtually any element which defines a schema component. A mapping file associated with the entire vocabulary being defined in the schema document can conveniently be placed in a top-level annotation, within the xsd:documentation element which contains documentation intended for human readers.

If the intent is simply to make it possible for humans to find the mapping file, it can simply be mentioned in prose. The annotation element shown in the schema document above might be modified to read:

 <xsd:annotation>
  <xsd:documentation>
   A simple schema for timelog data.
   Revisions:
   2004-03-24: specify schema
   2002-01-04: design XML vocabulary to capture old timelog data

   An XSLT stylesheet showing a mapping from this vocabulary into
   logical form (strictly speaking, into Prolog) is at 

     http://example.org/mcxrx/log.logic.xsl

   A similar stylesheet mapping into RDF is at

     http://example.org/mcxrx/log.rdf.xsl

  </xsd:documentation>
 </xsd:annotation>

It might be more useful, however, if the linkage were machine-processable. An XML vocabulary might be invented and suitable elements inserted into the xsd:appinfo element which occurs as the child of xsd:annotation. If we use elements which conform to the XLink rules for extended links, we might have something like this:

 <xsd:annotation>
  <xsd:documentation>
   A simple schema for timelog data.
   Revisions:
   2004-03-24: specify schema
   2002-01-04: design XML vocabulary to capture old timelog data
  </xsd:documentation>
  <xsd:appinfo>
  </xsd:appinfo>

  <xsd:appinfo>
   <mappings>
    <map xlink:type="extended">
      <source 
         xlink:type="locator" 
         xlink:role="http://example.org/mcxrx/roles#source-vocabulary"
         xlink:href="http://example.org/mcxrx/timelog#"  />
      <mapping
         xlink:type="locator" 
         xlink:role="http://example.org/mcxrx/roles#semantic-map"
         xlink:href="http://example.org/mcxrx/log.logic.xsl"  />
      <target 
         xlink:type="locator" 
         xlink:role="http://example.org/mcxrx/roles#target-vocabulary"
         xlink:href="http://example.org/mcxrx/transforms#prolog">Prolog</target>
    </map>
    <map xlink:type="extended">
      <source 
         xlink:type="locator" 
         xlink:role="http://example.org/mcxrx/roles#source-vocabulary"
         xlink:href="http://example.org/mcxrx/timelog#"  />
      <mapping
         xlink:type="locator" 
         xlink:role="http://example.org/mcxrx/roles#semantic-map"
         xlink:href="http://example.org/mcxrx/log.rdf.xsl"  />
      <target 
         xlink:type="locator" 
         xlink:role="http://example.org/mcxrx/roles#target-vocabulary"
         xlink:href=""http://example.org/mcxrx/transforms#RDF">RDF/XML</target>
    </map>
   </mappings>
  </xsd:appinfo>
 </xsd:annotation>
The drawback of this method is that since there is no public specification or proposal for the mappings element and its children (this paper is not such a proposal), no one but the author knows exactly what this entry in the xsd:appinfo element means.

It may be more plausible, therefore, to use an existing vocabulary designed for the purpose. We can, for example, place the relevant RDDL information into the xsd:annotation element. Since RDDL is intended to be both human- and machine-readable, it could in theory go in either place. By analogy with the use of RDDL in namespace documents, however, we here illustrate its use (mixed with XHTML) in the xsd:documentation element:

 <xsd:annotation>
  <xsd:documentation>
   <div xmlns="http://www.w3.org/1999/xhtml"
        xmlns:rddl="http://www.rddl.org/"
        xmlns:xlink="http://www.w3.org/1999/xlink"
   >
    <p>Sample schema for use in XML Schema
     tutorial, Sydney, 18 August 2003.</p>
    <p>A simple schema for timelog data.</p>
    <p>Revisions:</p>
    <ul>
     <li>2004-03-24: specify schema</li>
     <li>2002-01-04: design XML vocabulary to capture old timelog data</li>
    </ul>
    <p>Resources relevant to this vocabulary include:</p>
    <ul>
      <li><rddl:resource xlink:href="http://example.org/mcxrx/timelog#"
                         xlink:role="http://www.tei-c.org/P4X/"
                         xlink:arcrole="reference"
                         xlink:title="documentation">
          human-readable documentation</rddl:resource></li>
      <li><rddl:resource xlink:href="http://example.org/mcxrx/timelog#"
                         xlink:role="http://www.isi.edu/in-notes/iana/assignments/
                         media-types/application/xml-dtd"
                         xlink:arcrole="http://www.rddl.org/purposes/#validation"
                         xlink:title="DTD">
          XML 1.0 DTD in bracket-bang notation</rddl:resource></li>

      <li><rddl:resource xlink:href="http://example.org/mcxrx/timelog#"
                         xlink:role="http://www.w3.org/1999/XSL/Transform"
                         xlink:arcrole="http://example.org/mcxrx/
                         purposes#logic-mapping"
                         xlink:title="logical form">
          XSLT stylesheet showing mapping into logical form</rddl:resource></li>
      <li><rddl:resource xlink:href="http://example.org/mcxrx/timelog#"
                         xlink:role="http://www.w3.org/1999/XSL/Transform"
                         xlink:arcrole="http://example.org/mcxrx/
                         purposes#RDF-mapping"
                         xlink:title="RDF mapping">
          XSLT stylesheet showing mapping into RDF</rddl:resource></li>
    </ul>
   </div>
  </xsd:documentation>
 </xsd:annotation>

We could equally well embed the relevant RDF into the xsd:annotation element. Placing it in the xsd:appinfo element, we might get something like this:

 <xsd:annotation>
  <xsd:documentation>
   <div xmlns="http://www.w3.org/1999/xhtml"
   >
    <p>Sample schema for use in XML Schema
     tutorial, Sydney, 18 August 2003.</p>
    <p>A simple schema for timelog data.</p>
    <p>Revisions:</p>
    <ul>
     <li>2004-03-24: specify schema</li>
     <li>2002-01-04: design XML vocabulary to capture old timelog data</li>
    </ul>
   </div>
  </xsd:documentation>
  <xsd:appinfo>
    <ex:NameSpace rdf:about="http://example.org/mcxrx/timelog#">
      <dc:title>Simple timelog namespace</dc:title>
    </ex:NameSpace>
    
    <ex:XSLTTransform rdf:resource = "http://example.org/mcxrx/log.logic.xsl">
      <dc:title>Simple XSLT transformation of timelog data into logic</dc:title>
    </ex:XSLTTransform>
    
    <ex:XSLTTransform rdf:resource = "http://example.org/mcxrx/log.rdf.xsl">
      <dc:title>Simple XSLT transformation of timelog data into RDF</dc:title>
    </ex:XSLTTransform>
    
    <rdf:Desciption>
      <dc:title>Logic Mapping of simple timelog namespace to Prolog</dc:title>
      <ex:source rdf:resource = "http://example.org/mcxrx/timelog#" />
      <ex:target rdf:resource = "http://example.org/mcxrx/transforms#prolog" />
      <ex:transform rdf:resource = "http://example.org/mcxrx/log.logic.xsl" />
    </rdf:Description>
    
    <rdf:Desciption>
      <dc:title>Logic Mapping of simple timelog namespace to RDF</dc:title>
      <ex:source rdf:resource = "http://example.org/mcxrx/timelog#" />
      <ex:target rdf:resource = "http://example.org/mcxrx/transforms#RDF" />
      <ex:transform rdf:resource = "http://example.org/mcxrx/log.rdf.xsl" />
    </rdf:Description>
  </xsd:appinfo>
 </xsd:annotation>

Discussion and further work

In the previous section we give examples of how the designer of an XML vocabulary, or others, can ‘annotate’ the vocabulary and show how to map data expressed in the vocabulary into other target syntaxes, which can be taken as proxies for specific target models.

When the mappings are defined by the vocabulary designer or schema author, we have shown how to associate the mappings with the vocabulary using the xsd:annotation mechanism of XML Schema 1.0. When they are defined by others, we have shown how the association can be performed using stand-off annotations; further work is required, though, to explore mechanisms for allowing other parties to learn of and acquire such mappings.

One of the most obvious and important uses for such mappings is to enable data conforming to specialized or localized vocabularies to be translated into more widely understood forms. Well designed XML vocabularies typically exploit regularities in the information being captured or in the work processes used to produce and manipulate the XML in order to make the notation concise, easy to understand in context, and easy to process. When data are reused or processed outside their original context, however, the regularities exploited by the designer of the vocabulary may no longer exist, and the notation will accordingly seem hard to understand and arbitrary in its meanings. Translation to a common reference model like RDF serves to make at least some of the implict assumptions embedded in colloquial XML vocabularies more explicit, and to make the data more easily reusable in new applications and more easily comprehensible to larger communities.

It is for this reason that mappings such as those described here may be important in the development of a more intelligent network of humans and machines.

In the interests of clarifying some of the issues involved in specifying such mappings, several points may be made about the mappings shown above in the main part of this paper.

The example shows that XSLT can be used successfully for defining such mappings. But while XSLT is a powerful language for XML transformation, we expect that additional means for annotating schemas to reference transformation services and other code may also be required, and further exploration of the problem area and possible solutions is needed.

Although both the source vocabulary and the target syntaxes are almost trivially simple, the single example given in this version of this paper has surfaced a number of potentially tricky issues. The simplicity of the example has allowed us to get what feels like a reasonably good grip on some of them. But it is important to test any solution not just on data which already closely resembles the design of a relational database, but on data with less predictable structures.

One point which distinguishes the mappings shown here from most other proposals for such mappings which we have seen is that the mappings here are from tuple to tuple (in the Prolog translation) or from tuple to set of properties (in the RDF translation), rather than from individual construct to individual construct. That is, the mappings above do not contain anything interpretable as

  • The date attribute maps to the date_of_chunk property.
  • The start attribute maps to the start_time property.
  • The end attribute maps to the end_time property.
  • etc.
although many proposals for capturing the semantics of XML vocabularies take precisely this point-to-point approach.

Instead, the mappings shown above for the logentry element map an entire tuple at a time. In simple examples (in particular, for examples already in something like third normal form) the difference may be subtle, but in more complex examples it is crucial: what are added to a relational table (for example) are not individual column values, but rows containing column values, and no mapping can succeed if it does not provide a way to tell which values go with which other values to create rows. In RDF terms, a point-to-point specification of a mapping does not provide enough information: it does not tell us what resource the date_of_chunk, start_time, and end_time properties are supposed to be attached to. In the case of our sample timelog data, the answer is that it is the resource whose ID is generated from the logentry element on which the particular date and time values appear; this is explicit in the XSLT mappings given above, but cannot be explicit in a mapping language which allows only the documentation of point-to-point correspondences.11

In the mapping to logic, each template which generates tuples provides three kinds of explicit information:

  • The template element's match attribute specifies which kind of elements will be mapped to tuples (and the mode attribute, if present, specifies the XSLT mode in which this will happen).
  • The literal strings within the template specify the predicate (if we think in Prolog terms) or the name of the relation (if we think in relational terms).
  • The embedded xsl:value-of elements and the values of their select attributes specify the values to be taken as arguments to the predicate (or in other words the values to be written into the columns of the row in the relation).
    • The order of the xsl:value-of attributes maps the values to positions in the parameter list.
    • The XPath expressions in the select attributes specify the location of the desired value relative to the node matched by the template.12
Additionally, there is some implicit information given by the execution model and the declarative semantics of XSLT:
  • The rules governing template-matching, recursion, and modes determine when particular templates match particular elements; indirectly, therefore, they determine just how many clauses are generated in the Prolog syntax.
In addition, special conditions on the transformation (e.g. the rules which allow us to generate just one category(agenda) clause from the document, instead of several) are handled by using the flow-of-control features built into XSLT.

In the RDF mapping, similar considerations apply. Some important information is explicit:

  • For templates which generate elements in the RDF output, the template element's match attribute specifies which kind of elements will be mapped to node elements in the RDF (and thus, indirectly, specifies which elements map to resources as RDF defines the term).
  • The literal result elements within the template, which turn into property elements in the RDF-XML output, specify
    • the type of resource mapped to
    • the properties associated with the resource in question
  • The embedded xsl:value-of elements and the values of their select attributes specify the values to be assigned to the properties; the order of xsl:value-of elements and literal result elements in the template determines which values go with which properties, and the XPath expressions in the select attributes specify where to find the values in the input document, or how to calculate them from data in the input document.

On the whole, it has proven possible in this example to specify mappings which sustain a mostly declarative interpretation, and the mappings have proven relatively concise and straightforward.

On the other hand, it has to be said that the generality of XSLT brings a certain amount of machinery with it, and the templates in the style sheets certainly do not seem to be the most concise method one could imagine of specifying the information necessary for a transform. And in XSLT, as in any other Turing-complete language, there is always the possibility of writing mappings which are not only less concise but also less declarative and less easily understood than those presented here. Specialized vocabularies which exploit the regularities observable in common use cases ought, if properly designed, to be more concise, more reliably declarative, and perhaps reversible (so that they could serve to guide not only transformations from the source vocabulary into RDF or logic, but also transformations in the other direction).13

It is only by getting a better grip on the semantics of markup and markup applications that we can do better at making our markup application-independent and reusable. That is reason enough for XML users of all kinds to be interested in the mapping problem and its solutions. It is only by finding ways to identify and exploit the information encoded in colloquial XML that we can make progress toward a Semantic Web, a Web in which software can understand most information better than software does today; as long as work on the Semantic Web relies solely on non-colloquial XML or other specialized notations, the large majority of existing and future XML will be inaccessible and projects for large-scale data integration will continue to suffer from precisely the same headaches which have always troubled them. That is reason enough for those interested in the Semantic Web to care about annotating schemas for colloquial XML vocabularies.

That schema languages and work on the Semantic Web have points of contact where each relies on and can benefit from the other has long been clear; this realization produced the ‘Cambridge Communiqué’ ([Swick/Thompson 1999]) years ago. Progress toward shared understanding of common problems has been erratic, but the time has perhaps now come when better progress can be made. We hope that the sample mappings described in this paper can contribute toward that ultimate end.

Notes

1.

It is slightly puzzling to observe that while RDF enthusiasts frequently talk aloud about the RDF mapping problem, the topic map community seems to spend less time talking about any need for mechanisms to translate data from colloquial vocabularies into topic maps. See, however, [Freese 2001].

2.

With the possible exception that documenting semantics in machine-processable form may make it easier to be explicit about some things which are easily left vague or underspecified in natural language.

3.

It is perhaps for this reason that some are unwilling to accept the concrete data-structure mapping problem as capturing the meaning of the markup: apart from the fact that the meaning of programming languages and their data constructs is itself a thorny problem, conventional data structures seldom have meanings that could conceivably be equivalent to, say, an HTML cit element, with all that it entails regarding the nature of publication and bibliographic citation. If we accept that the goal of a mapping from XML into concrete data structures is to preserve intact the meaning of the XML, it seems plausible to assume the author of the mapping had either a remarkably narrow view of XML semantics, or a remarkably broad one of the meaning of programming-language data structures.

4.

The string mcxrx is derived from the title of this paper; it has no other significance. The second-level domain example.org is reserved by RFC 2606 [Eastlake/Panitz 1999] for use in documentation and examples. It is not a real domain.

5.

Names and initials are not those of real people or projects.

6.

The practice is named for the logician Haskell Curry, although it was apparently first proposed by Frege and described by Schönfinkel.

7.

It's not just that two chunks ought not to have the same start and end points. If the data are clean, no two chunks should overlap, either. But we are just trying for a unique identifier here, not for a full specification of data validity.

8.

It may be that only logicians and spies are interested in inferences of this kind, and it may not be particularly useful in any practical application to try to make the existential quantifications explicit. But we include this in order to show how the ontological entailments of the markup can be made explicit.

9.

Exceptions can be imagined, but for the moment we choose to regard them as pathological. Difficult non-pathological cases may exist, but we ignore them for now.

10.

One might be tempted to say that the mappings document the semantics of the vocabulary just as the schemas document the syntax. This is not likely to be entirely satisfactory in details, but it seems like a reasonable first approximation of the truth.

11.

Some proposals for mapping languages appear to believe that the answer to such questions is always obvious. The attribute is attached to an element, after all, and if that element or one of its ancestors is held to represent a resource (as here, a log entry is mapped to a time chunk), then surely it's obvious that the property denoted by the attribute is a property of the entity represented by the appropriate ancestor. In practice, we believe this is not obvious enough. In this case, the ancestors of the date attribute include an arbitrary number of elements representing time periods, but the date_of_chunk property does not apply to them. If internal markup were allowed within paragraphs, a date attribute might also appear in other contexts, where it does not map to anything in the target syntax because p elements are suppressed entirely. How is a piece of software to know how to tell the difference?

12.

As [Sperberg-McQueen/Huitfeldt/Renear 2001a] point out, the use of relative position to assert relations is an important idiom in colloquial XML; as any comparison of colloquial and non-colloquial XML will make clear, the positional relations used vary quite a bit in colloquial vocabularies, and the meanings associated with a particular positional relationship vary too. This is why any successful mapping language must provide ways to make argument structure explicit. In the templates of our mappings, the xsl:value-of elements, with the XPath expression in their select attributes, serve as the deictic expressions whose necessity was postulated by [Sperberg-McQueen/Huitfeldt/Renear 2001a].

13.

Like any other specialized vocabulary which exploits regularities and implicit information to make the notation more concise, such specialized mapping vocabularies might be in need of translation into more explicit, more widely understood expressions in commonly understood languages. In this particular case, the meaning of a transformation language might usefully be documented by showing how to transform instances of the language into XSLT stylesheets which perform the mapping in question.


Bibliography

[ACH/ACL/ALLC 1994] Association for Computers and the Humanities, Association for Computational Linguistics, and Association for Literary and Linguistic Computing. 1994. Guidelines for Electronic Text Encoding and Interchange (TEI P3). Ed. C. M. Sperberg-McQueen and Lou Burnard. Chicago, Oxford: Text Encoding Initiative, 1994.

[Berners-Lee/Connolly/Swick 1999] Berners-Lee, T., D. Connolly, and R. Swick, ed. Web Architecture: Describing and Exchanging Data. W3C NOTE 7 June 1999. http://www.w3.org/1999/06/07-WebData

[Borden/Bray 2002] Borden, Jonathan, and Tim Bray. “Resource Directory Description Language (RDDL).” 18 February 2002. http://www.openhealth.org/RDDL/20020218/rddl-20020218.html

[Eastlake/Panitz 1999] Eastlake, D., and A. Panitz. “Reserved Top Level DNS Names”. RFC 2606. June 1999. http://www.rfc-editor.org/rfc/rfc2606.txt

[Fallside 2001] Fallside, David, ed. “ XML Schema Part 0: Primer”. W3C Recommendation, 2 May 2001. [Cambridge, Sophia-Antipolis, Tokyo: W3C] http://www.w3.org/TR/xmlschema-0/.

[Freese 2001] Freese, Eric. “Harvesting Knowledge from the Organization's Information Assets”. Paper given at XML Europe 2001, Berlin. http://www.gca.org/papers/xmleurope2001/papers/html/s31-1.html.

[Hazaël-Massieux/Connolly 2004] Hazaël-Massieux, Dominique, and Dan Connolly. “Gleaning Resource Descriptions from Dialects of Languages (GRDDL)”. W3C Coordination Group Note 13 April 2004. [Cambridge, Sophia-Antipolis, Tokyo: W3C] http://www.w3.org/TR/grddl/.

[Krupnikov/Thompson 2001] Krupnikov, K. Ari, and Henry S. Thompson. “Data Binding Using W3C XML Schema Annotations”. Talk at XML 2001, Orlando, December 2001. http://www.ltg.ed.ac.uk/~ht/mapping.html

[Langendoen/Simons 1995] Langendoen, D. Terence, and Gary F. Simons. “ Rationale for the TEI recommendations for feature-structure markup”. Computers and the Humanities 29.3 (1995): 191-209.

[Manola/Miller 2004] Manola, F., Miller, E. “ RDF Primer”. W3C Recommendation, 10 February 2004. [Cambridge, Sophia-Antipolis, Tokyo: W3C] http://www.w3.org/TR/rdf-primer/.

[Ogbuji 2001] Ogbuji, Uche. “Thinking XML: Basic XML and RDF techniques for knowledge management. Part 1: Generate RDF using XSLT.” IBM developerWorks, July 2001. http://www-106.ibm.com/developerworks/library/x-think4/

[Ramalho et al. 1999] Ramalho, José Carlos, Jorge Gustavo Rocha, José João Almeida, and Pedro Henriques. 1999. “SGML documents: Where does quality go?” Markup Languages: Theory & Practice 1.1 (1999): 75-90.

[Sperberg-McQueen 1996] Sperberg-McQueen, C. M. “On Information Factoring in Dublin Metadata Records.” http://tigger.uic.edu/~cmsmcq/tech/metadata.factoring.html

[Sperberg-McQueen/Huitfeldt/Renear 2001a] Sperberg-McQueen, C. M., Claus Huitfeldt, and Allen Renear. “Meaning and interpretation of markup.” Markup Languages: Theory & Practice 2.3 (2001): 215-234. http://www.w3.org/People/cmsmcq/2000/mim.html

[Sperberg-McQueen/Huitfeldt/Renear 2001b] Sperberg-McQueen, C. M., Claus Huitfeldt, and Allen Renear. “Practical extraction of meaning from markup.” Paper given at ACH/ALLC 2001, New York, June 2001. (Slides at http://www.w3.org/People/cmsmcq/2001/achallc2001/achallc2001.slides.html)

[Swick/Thompson 1999] Swick, Ralph R., and Henry S. Thompson, ed. The Cambridge Communiqué. W3C NOTE 7 October 1999. http://www.w3.org/TR/schema-arch

[Thompson 2001] Thompson, Henry S. “Normal Form Conventions for XML Representations of Structured Data”. Talk at XML 2001, Orlando, December 2001. http://www.ltg.ed.ac.uk/~ht/normalForms.html

[Vorthmann/Buck 2000a] Vorthmann, Scott, and Lee Buck. “Schema Adjunct Framework: Executive Summary”. 24 February 2000. http://www.tibco.com/software/standards_support/xmlresources/exec_summary.html

[Vorthmann/Buck 2000b] Vorthmann, Scott, and Lee Buck. “Schema Adjunct Framework: Draft Specification”. 24 February 2000. http://www.tibco.com/software/standards_support/xmlresources/spec.html

[Vorthmann/Robie 2001] Vorthmann, Scott, and Jonathan Robie. “Beyond schemas: Schema adjuncts and the outside world”. Markup Languages: Theory & Practice 2.3 (2001): 281-294.



On mapping from colloquial XML to RDF using XSLT

C. M. Sperberg-McQueen [World Wide Web Consortium, MIT Computer Science and AI Laboratory]
Eric Miller [World Wide Web Consortium, MIT Computer Science and AI Laboratory]