Uniform access to infosets via reflection

Henry S. Thompson
K. Ari Krupnikov
Jo Calder

Abstract

The XML Infoset provides an abstract data model for XML documents. W3C XML Schema defined an extension to the Infoset called the PSVI. Proposals for other extensions have been made. We present a universal, interoperable approach to accessing infosets, including extensions, using XPath.

Keywords: XPath; XSD/W3C Schema

Henry S. Thompson

Henry S. Thompson is Reader in Artificial Intelligence and Cognitive Science at the Division of Informatics at the University of Edinburgh, based in the Language Technology Group of the Human Communication Research Centre, and Managing Director of Markup Technology Ltd. He received his Ph.D. in Linguistics from the University of California at Berkeley in 1980. His university education was divided between Linguistics and Computer Science, in which he holds an M.Sc. His research interests have ranged widely, including natural language parsing, speech recognition, machine translation evaluation, modelling human lexical access mechanisms, the fine structure of human-human dialogue, language resource creation, and architectures for linguistic annotation. His current research is focussed on articulating and extending the architectures of XML. He was a member of the SGML Working Group of the World Wide Web Consortium which designed XML; is the author of the XED, the first free XML instance editor, and co-author of the LT XML toolkit; and is currently a member of the XSL and XML Schema Working Groups of the W3C. He currently is a member of the W3C Team, and is lead editor of the Structures part of the XML Schema W3C Recommendation, for which he co-wrote the first publicly available implementation, XSV. He has presented many papers and tutorials on SGML, DSSSL, XML, XSL and XML Schemas in both industrial and public settings over the last five years.

K. Ari Krupnikov

K. Ari Krupnikov has worked extensively with relational databases and data integration tools on systems ranging from IBM System/370 to Linux on PC. He has been involved in XML since 1999 and has published open source XML software. A veteran of several startups, Mr. Krupnikov got his professional start with an elite computer unit of the Israeli Defense Forces. He now divides his time between Scotland, where his research at the University of Edinburgh is focused on applications of XML Schema in object-oriented and relational programming, and independent consulting in California.

Jo Calder

Jo Calder is founder and CEO of Fourth Person, a company specializing in the generation of human language from computer data sources. Following an undergraduate degree in linguistics, his M.Sc. and Ph.D., both at Edinburgh, covered topics in the computer processing of human language. His academic research has included formalisms and implementations for grammatical representation and parsing, visualization and direct manipulation of grammatical resources, and architectures for human language generation. Since foundation in 2001, Fourth Person has become the foremost European language generation company, and is a significant user of the XML family of technologies.

Uniform access to infosets via reflection

Henry S. Thompson [University of Edinburgh, Markup Technology, World Wide Web Consortium]
K. Ari Krupnikov [University of Edinburgh]
Jo Calder [Fourth Person]

Extreme Markup Languages 2003® (Montréal, Québec)

Copyright © 2003 Henry S. Thompson, K. Ari Krupnikov, & Jo Calder. Reproduced with permission.

Introduction

"XPath is a language for addressing parts of an XML document", according to the XPath spec [XPath]. In a non-normative appendix, it defines a mapping from XML Infoset [Infoset] to its own data model. W3C XML Schema’s PSVI [XML Schema] defines an extension to the Infoset, i.e., new items and properties, and others have floated proposals for other extensions. In this paper, we propose a standardized mechanism that allows accessing the Infoset, including extensions, through XPath. This mechanism does not change the syntax of XPath nor does it impose any burden on implementations that do not provide Infoset extensions. The connection from XPath data model to Infoset is through an extension function, a facility that XPath implementations already support.

We propose that the Infoset be modeled as a separate, synthetic document in the implementation’s internal representation. We call this synthetic document a reflection of the Infoset. An extension function, which we call reflect, can then provide a mapping from a node in the context to a node in the reflection. Such a function is similar to the XPath id() function, in that it selects a nodeset from an existing tree. It is also similar to the XSLT document() function in that it gives access to a nodeset not explicitly present in the source document.

Different degrees of coupling are possible between an (XPath) processor and a component that provides Infoset access. In an extremely tight coupling, the XPath processor itself is required to produce an Infoset (and, for access to e.g., the PSVI, to construct and check a schema for the source document and carry out schema-validity assessment of the source using that schema). We have prototyped a somewhat looser coupling, utilizing unmodified existing XML Schema and XSLT processors. Our implementation is a two-stage (SAX) pipeline in which the first (schema) stage constructs a standard Infoset plus native PSVI information and the second (XSLT) stage includes the extension which implements just-in-time navigation of that addition. Our implementation of reflect() then finds information in the PSVI associated with nodes in the source document and synthesizes appropriate nodesets which the XSLT processor can operate on.

Reflection

We use the idea of reflection here, in a way modeled on its use in logic and computation, to refer to a process whereby a syntactic form in some language, in our case XML, is analysed and represented in some underlying data model, in our case the XML Infoset, and then the constituents of that representation are themselves expressed using the same syntactic form, in our case XML once again. The figures below will make this clear:

Figure 1: A trivial XML document (about corporate structure, perhaps)
<department number="1"/>
Figure 2: A tabular representation of (an implementation of) its Infoset (incomplete)

Element information item

Property Value
namespace name

local name

"department"

children

[]

attributes

{

Attribute information item

namespace name

local name

number

normalized value

1

}

Figure 3: One possible reflection
<child namespaceName="" localName="department">
 <attribute namespaceName="" localName="number" normalizedValue="1"/>
</child>

As is illustrated, whereas an XML document in general will consist of terms relevant to the domain of application (in this case, ‘department’ and ‘number’), its reflection will consist of terms relevant to the language’s data model (in this case, ‘child’, ‘attribute’, ‘localName’, and so on).

There are no principled constraints on the document type of a reflection. As with any XML application, a syntax (in the form of a DTD, schema, etc.) and a semantics, that is, a description of how the instances of the type correspond to constituents of the data model, are both required. Other things being equal, a document type in a normal form (see [Tho01]) has the advantage that the syntax follows in a regular way from the structure of the data model, and the semantics follows directly from the type of normal form chosen.

For example, Thompson and Tobin’s exploratory reflection of the XML Infoset [ThoTob01] uses the ANF [Alternating Normal Form] defined in [Tho01], in which elements reflecting infoitems alternate with items reflecting properties. The equivalent in this normal form to the reflection illustrated in Figure 3 above would be:

Figure 4: Alternating Normal Form illustration (abridged)
<element>
 <namespaceName xsi:nil="true"/>
 <localName>department</localName>
 <children/>
 <attributes>
  <attribute>
   <namespaceName xsi:nil="true"/>
   <localName>number</localName>
   <normalizedValue>1</normalizedValue>
  </attribute>
 </attributes>
</element>

The regular structure of this normal form is apparent even in this simple example: elements at odd-numbered depths (‘element’, ‘attribute’) reflect infoitems, while elements at even-numbered depths (‘localName’, ‘children’, etc.) reflect properties. The tree-structure of the document itself carries information: the property reflected by an element is a property of the infoitem reflected by that element’s parent, and its value is reflected by its child.

In the work reported here we use a new normal form closely related to Relation Normal Form, as described in [Tho01]. In this normal form, which will be described in greater detail below, attributes encode atomic-valued relations, elements encode individual-valued relations and the types of individuals are not overt. Figure 3 uses this form.

Reflection of Infoset extensions

There are a number of existing and proposed extensions to the inventory of items and properties defined in the Infoset recommendation itself [Infoset]. As long as an extension conforms to the basic item-and-property ontology, there is no problem with extending any approach to reflecting the core Infoset to cover it. For example, extending the alternating normal form approach to cover the PSVI is straight-forward, and indeed is included in [ThoTob01]. Since the PSVI includes both new items and new properties, the resulting reflections add considerable new structure.

No only is it appropriate to use namespaces to manifest the identity of a particular reflection, but when incorporating an extension, a change of namespace is appropriate. Again, an example should help:

Figure 5: Edinburgh Normal Form illustration, with extension (abridged)
     <child xmlns="http://www.ltg.ed.ac.uk/2003/06/XMLInfoset"
       xmlns:p="http://www.ltg.ed.ac.uk/2003/06/PSVInfoset"
       namespaceName="" localName="department"
       p:validity="valid" p:validationAttempted="full"
  <attribute namespaceName="" localName="number" normalizedValue="1"/>
  <p:typeDefinition p:name="xyzzy" p:targetNamespace="">
   <p:baseTypeDefinition p:name="pqrs">. . .</p:baseTypeDefinition>
    . . .
  </p:typeDefinition>
</child>

In the above, we can see both new properties with atomic values, e.g., p:validity and p:name, and new properties with new kinds of items as values, e.g., p:typeDefinition and p:baseTypeDefinition.

Accessing and navigating reflections with XPath

The core of our proposal is to give access to infosets, and in particular to infosets with extensions, by adding a generic extension function to XPath. This function, which we call reflect(), can be thought of as shifting from the data model for the source document, which contains only the information in the core infoset, to the data model for a reflection of the entire infoset, including any extensions. That new data model gives access to the extensions via their reflection.

reflect() takes no arguments as such, but in order to be recognized as an XPath extension function, it must be qualified by a namespace, and that namespace itself acts as a kind of argument. It is taken to denote the kind of reflection. For example, xmlns:anf="http://www.w3.org/2001/05/PSVInfosetExtension" . . .anf:reflect() would give access to an Alternating Normal Form reflection as defined in [ThoTob01], whereas our newer form would be accessed by xmlns:enf="http://www.ltg.ed.ac.uk/2003/06/PSVInfoset" . . .enf:reflect(). As required by XPath, when an implementation encounters such a form, it should either proceed with the reflection identified by the namespace, if it supports it, or fail (or fallback), if it doesn’t.

For each node in the XPath evaluation context of the reflect() step, it can be thought of as creating an element node corresponding to the appropriate infoitem (element or attribute), with children and attributes for the reflection of all the infoitem-valued and atomic-valued properties thereof, respectively. As an example, the construct

<xsl:for-each select="/department">
 <xsl:value-of xmlns:p="http://www.ltg.ed.ac.uk/2003/06/PSVInfoset"
               select="p:reflect()/p:typeDefinition/p:baseTypeDefinition/@p:name"/>
</xsl:for-each>
would return pqrs, the name of the base type definition of the type definition which was used to validate the department node of our example.

In understanding this example, it’s important to keep clear which tree is in view. The first select expression above, namely /department, is evaluated against the (XPath data model for) the tree in Figure 1. The first part of the second select, namely p:reflect(), switches us to the (XPath data model for) the document element (note not the document node) of the tree in Figure 5, which is the reflection of that tree with PSVI properties and infoitems added. The remaining steps in the second select, namely p:typeDefinition/p:baseTypeDefinition/@p:name, move us down through that reflected tree.

It should be clear that this approach is fully general. Any infoset extension which is well-defined and has a publicly-identified reflection is a candidate for access.

Which reflection?

For easy-to-understand access to the PSVI, and through it the components of the validating schema itself, what should the notional document type of the reflection look like? Predictability is crucial — knowledge of the PSVI and components as defined in [XML Schema] should be sufficient in most if not all cases to allow XPath expressions beginning with reflect() to be written. We suggested above that the use of a normal form was appropriate, and this would certainly contribute to predictability. So would a consistent conversion of names where literal transcription would violate XML syntax.

Bearing these considerations in mind, and thinking about the kind of things people will want to do with XPath expressions beginning with reflect(), we concluded that the existing Alternating Normal Form approach [ThoTob01] was too verbose. We’ve designed a new Normal Form, which we’re modestly calling ENF [Edinburgh Normal Form]. It’s defined as follows:

  • Elements encode relations to individuals (except for the document element, which is just a placeholder).
  • Attributes encode relations to atoms.
  • Schema types encode typed individuals and the types of atoms.

In the use of Edinburgh Normal Form for infoset reflection (see “Appendix: Schema documents for Edinburgh Normal Form”), we, furthermore, adopt a camel-case convention: spaces in the names of relations or types are elided, and the following character converted to upper case. For example, "Complex Type Definition" becomes "ComplexTypeDefinition" and "base type definition" becomes "baseTypeDefinition".

Names of properties of infoitems and schema components whose values are sets or sequences of individuals (as opposed to atoms) are treated specially in our reflection. Although in the original (e.g., "children", "attribute declarations") they occur once, have a collection as their value, and are naturally plural, in their ENF reflection, they are represented by zero or more occurrences of a given element, each of whose contents reflects one of the items in the set or sequence. Accordingly, their names are converted to singular in the reflection, e.g., "child", "attributeDeclaration".

Re-entrancy and circularity

The domain of interest, namely the PSVI and schema components, exhibit both re-entrancy (more than one path from a given individual to another, e.g., one Particle may have several distinct {term}s which have the same individual as their {type definition}) and circularity (a path leading back to the individual from which it started, e.g., from an element information item (a child of the document element) via [validation context] and [child] back to itself).

Both of these phenomena might appear to cause difficulty for reflection of a PSVI using our ENF-based approach. The fact that certain structures were not just equivalent, but where re-entrancy occurs were actually identical, is lost in a reflected serialization. Worse still, attempting to reflect and serialize in the face of circularity would lead to an infinite document stream.

We have taken a radical approach to these issues in the work reported here: we ignore them. That is, since for our purposes no actual serialization is required, we simply implement navigation lazily, one step at a time, over the actual re-entrant and circular underlying data, so that re-entrant paths actually do arrive at the same node. Similarly, paths which traverse circular connections just work like any other path, with the potential for never returning. This will only happen for certain kinds of non-deterministic paths, such as enf:reflect()/typeDefinition//*[@name="integer"], which one might write to test whether an element’s type was derived from integer. The use of double-slash (shorthand for the descendant-or-self axis) plus the fact that all baseTypeDefinition chains eventually lead to xs:anyType, which is its own baseTypeDefinition, means that there are an infinite number of paths to explore. This situation is parallel to that in LISP, where certain very simple functions (such as equal) must never be used on potentially circular data.

We have in mind to explore one obvious alternative in the case of circularity — during the evaluation of any given path, we could maintain a cache which mapped from nodes visited to the expression being evaluated against them, and return the empty node set any time a node was re-visited with an unchanged expression. This would, for instance, allow the path above to have the desired behavior.

Accommodating to the Normal Form

The decision to use a normal form carries with it a very simple, indeed simplistic, data model of individuals, relations and atoms. This is inconsistent with some aspects of the PSVI and schema component structures. These involve properties whose values are pairs, disjunctions, and other deviations from the simple typed-individual/atom view. To cope with this, we have to articulate a sub-structure for the reflected versions of these properties to bring them in to line with the normal form data model. We have defined a number of new ‘micro-components’ to address this (and will feed this back to the W3C XML Schema WG). For example, the {content type} property of Complex Type Definitions is defined as follows in [XML Schema]:

"One of empty, a simple type definition or a pair consisting of a ·content model· (i.e., a Particle (§2.2.3.2)) and one of mixed, element-only."

Our proposed corresponding micro-component looks like this:

Figure 6: Example of new micro-component

Schema Micro-component: content type

Property Value
{variety}

One of empty, simple, mixed, or element-only

{particle}

Required if {variety} is mixed or element-only, otherwise ·absent·. A Particle.

{simple type definition}

Required if {variety} is simple, otherwise ·absent·. A Simple Type Definition.

The reflection of this and the other micro-components can be seen in Figure 9.

Implementation

Our implementation strategy has been to leverage to the greatest extent possible facilities provided by appropriate processors, namely Xerces, as a parser which provides a PSVI implementation and Saxon, which provides XSLT and appropriate extension mechanisms. Despite relying on external functionality, we have defined an architecture that allows us to be flexible in the case of changes to the underlying processors and to minimize the amount of manual coding.

Our starting point is an annotation of the W3C XML Schema defining the ENF characterization of reflections. This annotation expresses the following data:

  • whether the processor implements (or fails to implement) specific functionality
  • for each infoitem or component, the type used by the processor to represent it
  • for each property of a of an infoitem or component, the method used to extract the relevant data

An example of a type specification is:

<xerces-type type="org.apache.xerces.impl.xs.psvi.XSTypeDefinition"/>
which indicates that the containing item (in this case the Complex Type Definition for Type Definition) is represented by the given Xerces type. An example of a property annotation is:
<xerces method="getNamespace"/>
which states that the named method is used by Xerces to extract the relevant data. Much of the annotation data can be derived automatically from the ENF characterization, because e.g., the method name is just ‘get’ plus the (capitalized) property name. The annotation shown above is required to override this automatic approach, because the name implied by the ENF characterization is targetNamespace, so the automatically generated name would be wrong.

Because of inexact matches between the data models used by the XML Schema specification from which ENF is derived and the Xerces PSVI and schema implementations, we allow two further kinds of annotation:

  • whether the use of a named method to decode a numeric key (typically representing bit-wise combinations) is necessary
  • whether some more complex computation has to be performed in order to derive appropriate data

By including these annotations in the base from which the ENF schema documents are produced via XSLT, we can produce a version of the ENF schema documents which include annotation and appinfo elements wrapping the xerces-type and xerces information. We then transform those schema documents to produce fragments of Java code. These are then assembled into a Java class containing a single method whose operation examines single steps from an XPath query in Saxon and includes appropriate data in a return structure representing the enumeration of a node set. From Saxon’s perspective, this structure is indistinguishable from those manipulated during normal processing. We achieve this by implementing Saxon’s AxisEnumeration interface. Saxon’s modular design means this is the only point of intervention required — the semantics of the various XPath axes, predicates etc. are almost completely invisible to us.

Because the schema documents for the ENF reflection are automatically generated from XML descriptions of PSVI infoitems and properties and schema components and properties, this entire datapath could easily be re-purposed to produce an implementation of some other normal form reflection, or a reflection of some other infoset extension altogether, with a very modest amount of work.

Runtime architecture

The runtime architecture of the system uses Xerces to parse the input document, with a filter being added that allows items in the PSVI corresponding to the input to be cached for later retrieval. This filter captures all data present in the Xerces PSVI implementation.

The input document is then processed by Saxon in the normal way, applying one or more user-defined stylesheets, which may refer to the extension function we have defined. On encountering a call to the extension function reflect() at some point in evaluating a path expression against the input document structure, we retrieve the cached PSVI item corresponding to that point. Queries within this item are then handled by a call to the generated code described above.

With the exception of the generated code, the overall implementation is quite compact, requiring a handful of major classes and a similar number of classes required to match up data representations or encodings. By contrast, the generated code is an order of magnitude larger.

Availability

As of this writing, the our implementation is not yet ready for public use, but when it is it will be available at http://www.ltg.ed.ac.uk/software/reflect/. Announcements of availability will be made in the appropriate fora.

Further work

Now that we’ve built it, we want to use it! In particular, our earlier work on normal forms was done in support of a declarative approach to data binding ([KrupTho01]), and we now plan to return to this work and implement it on top of the mechanism described here.

We are also interested in using our ENF reflection for actual serialization, which means dealing with re-entrancy and circularity. We have a plan to use ID/IDREF for this, in a way similar to that adopted in the ANF approach of [ThoTob01], but in a deterministic way. That is, in that ANF approach, the full expression of an item which is the value of several properties could appear as the child of either one of them, and be pointed to from the other. Navigating such a serialized reflection with XPath is very messy, as both possiblities has to be allowed for. We have in mind a variant of the schemas given in “Appendix: Schema documents for Edinburgh Normal Form” in which the reflection of a property is either always via IDREF or always via inclusion.

Appendix: Schema documents for Edinburgh Normal Form

The schema documents which follow cover the extensions to the infoset described in [XML Schema], in two schema documents. The first is for the PSVI as such, the second for schema components.

Figure 7: ENF schema document for PSVI reflection
<xs:schema xmlns="http://www.w3.org/2003/06/psviInfoset"
           xmlns:i="http://www.w3.org/2001/05/XMLInfoset"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified"
           targetNamespace="http://www.w3.org/2003/06/psviInfoset">
  <xs:import namespace="http://www.w3.org/2001/05/XMLInfoset"/>
  <xs:annotation>
    <xs:documentation>Auto-generated from psviPropsDescr.xml</xs:documentation>
  </xs:annotation>
  <xs:include schemaLocation="auxSimpleTypes.xsd"/>
  <xs:include schemaLocation="microENF.xsd"/>
  <xs:complexType name="AttributeInformationItem">
    <xs:sequence>
      <xs:element ref="validationContext"/>
      <xs:element name="declaration" type="AttributeDeclaration" minOccurs="0"/>
      <xs:element name="typeDefinition" type="SimpleTypeDefinition"
                  minOccurs="0"/>
      <xs:element ref="memberTypeDefinition" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="validity" use="required"/>
    <xs:attribute ref="validationAttempted" use="required"/>
    <xs:attribute ref="schemaSpecified"/>
    <xs:attribute ref="schemaErrorCode"/>
    <xs:attribute ref="schemaNormalizedValue"/>
    <xs:attribute ref="typeDefinitionType"/>
    <xs:attribute ref="typeDefinitionNamespace"/>
    <xs:attribute ref="typeDefinitionAnonymous"/>
    <xs:attribute ref="typeDefinitionName"/>
    <xs:attribute ref="memberTypeDefinitionNamespace"/>
    <xs:attribute ref="memberTypeDefinitionAnonymous"/>
    <xs:attribute ref="memberTypeDefinitionName"/>
    <xs:attribute ref="schemaDefault"/>
  </xs:complexType>
  <xs:complexType name="ElementInformationItem">
    <xs:sequence>
      <xs:element ref="validationContext"/>
      <xs:element name="declaration" type="ElementDeclaration" minOccurs="0"/>
      <xs:element name="typeDefinition" type="TypeDefinition" minOccurs="0"/>
      <xs:element ref="memberTypeDefinition" minOccurs="0"/>
      <xs:element ref="icBinding" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="notation" minOccurs="0"/>
      <xs:element ref="schemaInformation" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="iiBinding" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute ref="validity" use="required"/>
    <xs:attribute ref="validationAttempted" use="required"/>
    <xs:attribute ref="schemaErrorCode"/>
    <xs:attribute ref="nil"/>
    <xs:attribute ref="schemaNormalizedValue"/>
    <xs:attribute ref="typeDefinitionType"/>
    <xs:attribute ref="typeDefinitionNamespace"/>
    <xs:attribute ref="typeDefinitionAnonymous"/>
    <xs:attribute ref="typeDefinitionName"/>
    <xs:attribute ref="memberTypeDefinitionNamespace"/>
    <xs:attribute ref="memberTypeDefinitionAnonymous"/>
    <xs:attribute ref="memberTypeDefinitionName"/>
    <xs:attribute ref="schemaDefault"/>
    <xs:attribute ref="schemaSpecified"/>
    <xs:attribute ref="notationSystem"/>
    <xs:attribute ref="notationPublic"/>
  </xs:complexType>
  <xs:complexType name="icBinding">
    <xs:sequence>
      <xs:element ref="definition"/>
      <xs:element ref="keyBinding" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="iiBinding">
    <xs:sequence>
      <xs:element ref="binding" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute ref="id" use="required"/>
  </xs:complexType>
  <xs:complexType name="schemaInfo">
    <xs:sequence>
      <xs:element ref="schemaComponent" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="schemaDocument" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute ref="schemaNamespace"/>
  </xs:complexType>
  <xs:complexType name="schemaDoc">
    <xs:sequence>
      <xs:element ref="document" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="documentLocation"/>
  </xs:complexType>
  <xs:element name="binding" type="i:Element"/>
  <xs:element name="definition" type="Identity-ConstraintDefinition"/>
  <xs:element name="document" type="i:Document"/>
  <xs:element name="icBinding" type="icBinding"/>
  <xs:element name="iiBinding" type="iiBinding"/>
  <xs:element name="keyBinding" type="KeyBinding"/>
  <xs:element name="notation" type="NotationDeclaration"/>
  <xs:element name="schemaComponent" type="Component"/>
  <xs:element name="schemaDocument" type="schemaDoc"/>
  <xs:element name="schemaInformation" type="schemaInfo"/>
  <xs:element name="validationContext" type="i:Element"/>
  <xs:attribute name="documentLocation" type="xs:anyURI"/>
  <xs:attribute name="id" type="xs:NCName"/>
  <xs:attribute name="memberTypeDefinitionAnonymous" type="xs:boolean"/>
  <xs:attribute name="memberTypeDefinitionName" type="xs:NCName"/>
  <xs:attribute name="memberTypeDefinitionNamespace" type="xs:anyURI"/>
  <xs:attribute name="nil" type="xs:boolean"/>
  <xs:attribute name="notationPublic" type="xs:string"/>
  <xs:attribute name="notationSystem" type="xs:anyURI"/>
  <xs:attribute name="schemaDefault" type="xs:string"/>
  <xs:attribute name="schemaErrorCode" type="errorCode"/>
  <xs:attribute name="schemaNamespace" type="xs:anyURI"/>
  <xs:attribute name="schemaNormalizedValue" type="xs:string"/>
  <xs:attribute name="schemaSpecified" type="specified"/>
  <xs:attribute name="typeDefinitionAnonymous" type="xs:boolean"/>
  <xs:attribute name="typeDefinitionName" type="xs:NCName"/>
  <xs:attribute name="typeDefinitionNamespace" type="xs:anyURI"/>
  <xs:attribute name="typeDefinitionType" type="typeType"/>
  <xs:attribute name="validationAttempted" type="validationAttempted"/>
  <xs:attribute name="validity" type="validity"/>
</xs:schema>
Figure 8: ENF schema document for schema component reflection
<xs:schema xmlns="http://www.w3.org/2003/06/psviInfoset"
           xmlns:i="http://www.w3.org/2001/05/XMLInfoset"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified"
           targetNamespace="http://www.w3.org/2003/06/psviInfoset">
  <xs:import namespace="http://www.w3.org/2001/05/XMLInfoset"/>
  <xs:annotation>
    <xs:documentation>Auto-generated from schemaCompDescr.xml</xs:documentation>
  </xs:annotation>
  <xs:include schemaLocation="auxSimpleTypes.xsd"/>
  <xs:include schemaLocation="microENF.xsd"/>
  <xs:complexType name="AttributeDeclaration">
    <xs:complexContent>
      <xs:restriction base="Component">
        <xs:sequence>
          <xs:element ref="simpleTypeDefinition"/>
          <xs:element ref="scope" minOccurs="0"/>
          <xs:element ref="valueConstraint" minOccurs="0"/>
          <xs:element ref="annotation" minOccurs="0"/>
        </xs:sequence>
        <xs:attribute ref="name" use="required"/>
        <xs:attribute ref="targetNamespace"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="ElementDeclaration">
    <xs:complexContent>
      <xs:restriction base="Term">
        <xs:sequence>
          <xs:element ref="typeDefinition"/>
          <xs:element ref="scope" minOccurs="0"/>
          <xs:element ref="valueConstraint" minOccurs="0"/>
          <xs:element ref="constraintDefinition" minOccurs="0"
                      maxOccurs="unbounded"/>
          <xs:element ref="classExemplar" minOccurs="0"/>
          <xs:element ref="annotation" minOccurs="0"/>
        </xs:sequence>
        <xs:attribute ref="name" use="required"/>
        <xs:attribute ref="targetNamespace"/>
        <xs:attribute ref="nillable" use="required"/>
        <xs:attribute name="final" type="xs:reducedDerivationControl"
                      use="required"/>
        <xs:attribute name="exact" type="xs:derivationControl" use="required"/>
        <xs:attribute ref="abstract" use="required"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="TypeDefinition" abstract="true">
    <xs:complexContent>
      <xs:restriction base="Component">
        <xs:sequence>
          <xs:element name="baseTypeDefinition" type="TypeDefinition"/>
          <xs:any minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute ref="name"/>
        <xs:attribute ref="targetNamespace"/>
        <xs:attribute name="final" type="xs:derivationControl" use="required"/>
        <xs:anyAttribute/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="Facet" abstract="true">
    <xs:sequence>
      <xs:any minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType>
  <xs:complexType name="FundamentalFacet" abstract="true">
    <xs:sequence>
      <xs:any minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType>
  <xs:complexType name="Component" abstract="true">
    <xs:sequence>
      <xs:any minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType>
  <xs:complexType name="Term" abstract="true">
    <xs:complexContent>
      <xs:restriction base="Component">
        <xs:sequence>
          <xs:any minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:anyAttribute/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="ComplexTypeDefinition">
    <xs:complexContent>
      <xs:restriction base="TypeDefinition">
        <xs:sequence>
          <xs:element name="baseTypeDefinition" type="TypeDefinition"/>
          <xs:element ref="attributeUse" minOccurs="0" maxOccurs="unbounded"/>
          <xs:element ref="attributeWildcard" minOccurs="0"/>
          <xs:element ref="contentType"/>
          <xs:element ref="annotation" minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute ref="name"/>
        <xs:attribute ref="targetNamespace"/>
        <xs:attribute ref="derivationMethod" use="required"/>
        <xs:attribute name="final" type="xs:reducedDerivationControl"
                      use="required"/>
        <xs:attribute ref="abstract" use="required"/>
        <xs:attribute name="exact" type="xs:reducedDerivationControl"
                      use="required"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="AttributeUse">
    <xs:sequence>
      <xs:element ref="attributeDeclaration"/>
      <xs:element ref="valueConstraint" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="required" use="required"/>
  </xs:complexType>
  <xs:complexType name="AttributeGroupDefinition">
    <xs:complexContent>
      <xs:restriction base="Component">
        <xs:sequence>
          <xs:element ref="attributeUse" minOccurs="0" maxOccurs="unbounded"/>
          <xs:element ref="attributeWildcard" minOccurs="0"/>
          <xs:element ref="annotation" minOccurs="0"/>
        </xs:sequence>
        <xs:attribute ref="name" use="required"/>
        <xs:attribute ref="targetNamespace"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="ModelGroupDefinition">
    <xs:complexContent>
      <xs:restriction base="Component">
        <xs:sequence>
          <xs:element ref="modelGroup"/>
          <xs:element ref="annotation" minOccurs="0"/>
        </xs:sequence>
        <xs:attribute ref="name" use="required"/>
        <xs:attribute ref="targetNamespace"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="ModelGroup">
    <xs:complexContent>
      <xs:restriction base="Term">
        <xs:sequence>
          <xs:element ref="particle" minOccurs="0" maxOccurs="unbounded"/>
          <xs:element ref="annotation" minOccurs="0"/>
        </xs:sequence>
        <xs:attribute ref="compositor" use="required"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="Particle">
    <xs:sequence>
      <xs:element ref="term"/>
    </xs:sequence>
    <xs:attribute ref="minOccurs" use="required"/>
    <xs:attribute ref="maxOccurs" use="required"/>
  </xs:complexType>
  <xs:complexType name="Wildcard">
    <xs:complexContent>
      <xs:restriction base="Term">
        <xs:sequence>
          <xs:element ref="namespaceConstraint" minOccurs="0"/>
          <xs:element ref="annotation" minOccurs="0"/>
        </xs:sequence>
        <xs:attribute ref="processContents" use="required"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="Identity-ConstraintDefinition">
    <xs:sequence>
      <xs:element ref="field" maxOccurs="unbounded"/>
      <xs:element ref="referencedKey" minOccurs="0"/>
      <xs:element ref="annotation" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute ref="name" use="required"/>
    <xs:attribute ref="targetNamespace"/>
    <xs:attribute ref="constraintName" use="required"/>
    <xs:attribute ref="selector" use="required"/>
  </xs:complexType>
  <xs:complexType name="NotationDeclaration">
    <xs:complexContent>
      <xs:restriction base="Component">
        <xs:sequence>
          <xs:element ref="annotation" minOccurs="0"/>
        </xs:sequence>
        <xs:attribute ref="name" use="required"/>
        <xs:attribute ref="targetNamespace"/>
        <xs:attribute ref="systemIdentifier"/>
        <xs:attribute ref="publicIdentifier"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="Annotation">
    <xs:sequence>
      <xs:element ref="applicationInformation" minOccurs="0"
                  maxOccurs="unbounded"/>
      <xs:element ref="userInformation" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="attribute" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="SimpleTypeDefinition">
    <xs:complexContent>
      <xs:restriction base="TypeDefinition">
        <xs:sequence>
          <xs:element name="baseTypeDefinition" type="SimpleTypeDefinition"/>
          <xs:element ref="facet" minOccurs="0" maxOccurs="unbounded"/>
          <xs:element ref="fundamentalFacet" minOccurs="0" maxOccurs="unbounded"/>
          <xs:element ref="primitiveTypeDefinition" minOccurs="0"/>
          <xs:element ref="itemTypeDefinition" minOccurs="0"/>
          <xs:element ref="memberTypeDefinition" minOccurs="0"
                      maxOccurs="unbounded"/>
          <xs:element ref="annotation" minOccurs="0"/>
        </xs:sequence>
        <xs:attribute ref="name"/>
        <xs:attribute ref="targetNamespace"/>
        <xs:attribute name="final" type="xs:derivationControl" use="required"/>
        <xs:attribute ref="variety" use="required"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  <xs:complexType name="schema">
    <xs:sequence>
      <xs:element ref="typeDefinition" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="attributeDeclaration" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="elementDeclaration" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="attributeGroupDefinition" minOccurs="0"
                  maxOccurs="unbounded"/>
      <xs:element ref="modelGroupDefinition" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="notationDeclaration" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="annotation" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
  <xs:element name="annotation" type="Annotation"/>
  <xs:element name="applicationInformation" type="i:Element"/>
  <xs:element name="attribute" type="i:Attribute"/>
  <xs:element name="attributeDeclaration" type="AttributeDeclaration"/>
  <xs:element name="attributeGroupDefinition" type="AttributeGroupDefinition"/>
  <xs:element name="attributeUse" type="AttributeUse"/>
  <xs:element name="attributeWildcard" type="Wildcard"/>
  <xs:element name="baseTypeDefinition" type="TypeDefinition"/>
  <xs:element name="classExemplar" type="ElementDeclaration"/>
  <xs:element name="constraintDefinition" type="Identity-ConstraintDefinition"/>
  <xs:element name="contentType" type="ContentType"/>
  <xs:element name="elementDeclaration" type="ElementDeclaration"/>
  <xs:element name="facet" type="Facet"/>
  <xs:element name="field" type="xpathExpr"/>
  <xs:element name="fundamentalFacet" type="FundamentalFacet"/>
  <xs:element name="itemTypeDefinition" type="SimpleTypeDefinition"/>
  <xs:element name="memberTypeDefinition" type="SimpleTypeDefinition"/>
  <xs:element name="modelGroup" type="ModelGroup"/>
  <xs:element name="modelGroupDefinition" type="ModelGroupDefinition"/>
  <xs:element name="namespaceConstraint" type="NamespaceConstraint"/>
  <xs:element name="notationDeclaration" type="NotationDeclaration"/>
  <xs:element name="particle" type="Particle"/>
  <xs:element name="primitiveTypeDefinition" type="SimpleTypeDefinition"/>
  <xs:element name="referencedKey" type="Identity-ConstraintDefinition"/>
  <xs:element name="scope" type="Scope"/>
  <xs:element name="simpleTypeDefinition" type="SimpleTypeDefinition"/>
  <xs:element name="term" type="Term"/>
  <xs:element name="typeDefinition" type="TypeDefinition"/>
  <xs:element name="userInformation" type="i:Element"/>
  <xs:element name="valueConstraint" type="ValueConstraint"/>
  <xs:attribute name="abstract" type="xs:boolean"/>
  <xs:attribute name="compositor" type="compositor"/>
  <xs:attribute name="constraintName" type="icName"/>
  <xs:attribute name="derivationMethod" type="xs:reducedDerivationControl"/>
  <xs:attribute name="maxOccurs" type="xs:allNNI"/>
  <xs:attribute name="minOccurs" type="xs:nonNegativeInteger"/>
  <xs:attribute name="name" type="xs:NCName"/>
  <xs:attribute name="nillable" type="xs:boolean"/>
  <xs:attribute name="processContents" type="processContents"/>
  <xs:attribute name="publicIdentifier" type="publicID"/>
  <xs:attribute name="required" type="xs:boolean"/>
  <xs:attribute name="selector" type="xpathExpr"/>
  <xs:attribute name="systemIdentifier" type="xs:anyURI"/>
  <xs:attribute name="targetNamespace" type="xs:anyURI"/>
  <xs:attribute name="variety" type="stVariety"/>
</xs:schema>
Figure 9: ENF schema document for micro component reflection
<xs:schema xmlns="http://www.w3.org/2003/06/psviInfoset"
           xmlns:i="http://www.w3.org/2001/05/XMLInfoset"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           elementFormDefault="qualified"
           targetNamespace="http://www.w3.org/2003/06/psviInfoset">
  <xs:import namespace="http://www.w3.org/2001/05/XMLInfoset"/>
  <xs:annotation>
    <xs:documentation>Auto-generated from </xs:documentation>
  </xs:annotation>
  <xs:include schemaLocation="auxSimpleTypes.xsd"/>
  <xs:include schemaLocation="microENF.xsd"/>
  <xs:complexType name="NamespaceConstraint">
    <xs:sequence/>
    <xs:attribute name="variety" type="ncVariety" use="required"/>
    <xs:attribute ref="namespaces"/>
  </xs:complexType>
  <xs:complexType name="Scope">
    <xs:sequence>
      <xs:element ref="parent" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute name="variety" type="scopeVariety" use="required"/>
  </xs:complexType>
  <xs:complexType name="ValueConstraint">
    <xs:sequence>
      <xs:element ref="value"/>
    </xs:sequence>
    <xs:attribute name="variety" type="vcVariety" use="required"/>
  </xs:complexType>
  <xs:complexType name="ContentType">
    <xs:sequence>
      <xs:element ref="particle" minOccurs="0"/>
      <xs:element ref="simpleTypeDefinition" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute name="variety" type="ctVariety" use="required"/>
  </xs:complexType>
  <xs:complexType name="KeyBinding">
    <xs:sequence>
      <xs:element ref="node"/>
    </xs:sequence>
    <xs:attribute ref="key" use="required"/>
  </xs:complexType>
  <xs:element name="node" type="i:Element"/>
  <xs:element name="parent" type="Component"/>
  <xs:element name="value" type="abstractAny"/>
  <xs:attribute name="key" type="kvList"/>
  <xs:attribute name="namespaces" type="xs:anyURI"/>
</xs:schema>
Figure 10: ENF schema document for auxiliary simple types
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://www.w3.org/2003/06/psviInfoset">
  <xs:annotation>
    <xs:documentation>Simple type definitions for property-encoding elements in
the auto-generated component and PSVI schemas</xs:documentation>
  </xs:annotation>
  <xs:simpleType name="compositor">
    <xs:restriction base="xs:NCName">
      <xs:enumeration value="all"/>
      <xs:enumeration value="choice"/>
      <xs:enumeration value="sequence"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="stVariety">
    <xs:restriction base="xs:NCName">
      <xs:enumeration value="atomic"/>
      <xs:enumeration value="list"/>
      <xs:enumeration value="union"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="processContents">
    <xs:restriction base="xs:NCName">
      <xs:enumeration value="skip"/>
      <xs:enumeration value="strict"/>
      <xs:enumeration value="lax"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="publicID">
    <xs:restriction base="xs:string"/>
  </xs:simpleType>
  <xs:simpleType name="icName">
    <xs:restriction base="xs:NCName">
      <xs:enumeration value="key"/>
      <xs:enumeration value="keyref"/>
      <xs:enumeration value="unique"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="xpathExpr">
    <xs:restriction base="xs:token"/>
  </xs:simpleType>
  <xs:simpleType name="validity">
    <xs:restriction base="xs:NCName">
      <xs:enumeration value="valid"/>
      <xs:enumeration value="notKnown"/>
      <xs:enumeration value="invalid"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="validationAttempted">
    <xs:restriction base="xs:NCName">
      <xs:enumeration value="full"/>
      <xs:enumeration value="partial"/>
      <xs:enumeration value="none"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="errorCode">
    <xs:list itemType="xs:string"/>
  </xs:simpleType>
  <xs:simpleType name="typeType">
    <xs:annotation>
      <xs:documentation>Should be abstract, with one vacuous and
                     one simple-only restriction</xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:NCName">
      <xs:enumeration value="simple"/>
      <xs:enumeration value="complex"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="specified">
    <xs:restriction base="xs:NCName">
      <xs:enumeration value="schema"/>
      <xs:enumeration value="infoset"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="kvList">
    <xs:list itemType="xs:anySimpleType"/>
  </xs:simpleType>
  <xs:simpleType name="ncVariety">
    <xs:restriction base="xs:NCName">
      <xs:enumeration value="any"/>
      <xs:enumeration value="positive"/>
      <xs:enumeration value="not"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="scopeVariety">
    <xs:restriction base="xs:NCName">
      <xs:enumeration value="global"/>
      <xs:enumeration value="local"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="vcVariety">
    <xs:restriction base="xs:NCName">
      <xs:enumeration value="default"/>
      <xs:enumeration value="fixed"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:simpleType name="ctVariety">
    <xs:restriction base="xs:NCName">
      <xs:enumeration value="empty"/>
      <xs:enumeration value="simple"/>
      <xs:enumeration value="element-only"/>
      <xs:enumeration value="mixed"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:complexType name="abstractAny" abstract="true">
    <xs:annotation>
      <xs:documentation>Must be sub-typed with xsi:type in any reflection</xs:documentation>
    </xs:annotation>
    <xs:simpleContent>
      <xs:extension base="xs:anySimpleType"/>
    </xs:simpleContent>
  </xs:complexType>
</xs:schema>

Acknowledgments

The work reported here was largely supported by a research grant from the UK Engineering and Physical Sciences Research Council. The authors have benefitted greatly from discussions with Richard Tobin, to whom thanks, but no blame for our mistakes, are due. Thanks also to John Cowan, whose insistence that Infoset extensions should always define a reflection stimulated our thinking about this.


Bibliography

[Infoset] W3C. XML Information Set, John Cowan and Richard Tobin, eds. 16 March 2001. See http://www.w3.org/TR/2001/WD-xml-infoset-20010316/.

[KrupTho01] Krupnikov, K. Ari and Henry S. Thompson. 2001. Data Binding Using W3C XML Schema Annotations, In Proceedings of XML 2001, IDEAlliance, Alexandria, VA USA. Available online at http://www.idealliance.org/papers/xml2001/papers/pdf/06-05-01.pdf and, in a pre-publication version, at http://www.ltg.ed.ac.uk/~ht/normalForms.html.

[Tho01] Thompson, Henry S. 2001. Normal Form Conventions for XML Representations of Structured Data, In Proceedings of XML 2001, IDEAlliance, Alexandria, VA USA. Available online at http://www.idealliance.org/papers/xml2001/papers/pdf/06-05-01.pdf and, in a pre-publication version, at http://www.ltg.ed.ac.uk/~ht/normalForms.html.

[ThoTob01] Tobin, Richard and Henry S. Thompson. 2001. A schema for serialized infosets, World Wide Web Consortium, Cambridge, MA. Available online at http://www.w3.org/2001/05/serialized-infoset-schema.html.

[XML Schema] W3C. XML Schema Part 1: Structures, David Beech, Murray Maloney, Noah Mendelsohn, and Henry S. Thompson, eds. 2 May 2001. See http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/.

[XPath] W3C. XML Path Language, James Clark and Steve DeRose, eds. 16 November 1999. See http://www.w3.org/TR/1999/REC-xpath-19991116.



Uniform access to infosets via reflection

Henry S. Thompson [University of Edinburgh, Markup TechnologyWorld Wide Web Consortium]
K. Ari Krupnikov [University of Edinburgh]
Jo Calder [Fourth Person]