XIndirect: Indirect addressing for XML

W. Eliot Kimber
eliot@isogen.com

Abstract

This paper describes and explains the XIndirect facility, a W3C Note. The XIndirect Note defines a simple mechanism for representing indirect addresses that can be used with other XML-based linking and addressing facilities, such as XLink and XInclude. XIndirect is motivated primarily by the requirements of XML authoring in which the management of pointers among systems of documents under constant revision cannot be easily satisfied by the direct pointers provided by XLink and XInclude. Indirect addressing is inherently expensive to implement because of both the processing demands of multi-step pointers and the increased system complexity required to do the processing. XLink and XPointer (and by extension, XInclude) explicitly and appropriately avoid indirection in order to provide the simplest possible solution for the delivery of hyperlinked documents, especially in the context of essentially unbounded systems, such as the World Wide Web. XIndirect enables indirect addressing when needed without adding complexity to the existing XML linking and addressing facilities — by defining indirection as a separate, independent facility, processors that only need to support delivery of documents are not required to support indirection simply in order to support XLink or XInclude. Rather, when indirection management is required, developers of XML information management systems can limit the support for indirection to closed systems of controlled scope where indirection is practical to implement.

This paper illustrates some of the key use cases that motivate the need for the XIndirect facility, describes the facility itself, and discusses a reference implementation of the XIndirect facility.

Keywords: XLink; Editing/Authoring; XPointer

W. Eliot Kimber

W. Eliot Kimber is a Consultant at ISOGEN. Eliot is a founding member of the XML Working Group, Co-editor of ISO/IEC 10744:1977 (HyTime), and Co-Editor of ISO/IEC 10743, Standard Music Description Language. Eliot is also involved in the STEP and SGML Harmonization effort, which has led to a deeper appreciation of the power and utility of formal data modeling as a design and analysis tool. Eliot writes and speaks frequently on the subject of SGML, XML, hyperlinking, and related topics. When not trying to wrestle chaotic data into orderly structures, Eliot enjoys swimming, biking, and guitar playing. Eliot is a devoted husband and dog owner.

XIndirect

Indirect addressing for XML

W. Eliot Kimber [ISOGEN International, LLC]

Extreme Markup Languages 2003® (Montréal, Québec)

Copyright © 2003 W. Eliot Kimber. Reproduced with permission.

Introduction: authoring, linking, and XML

The Web is primarily a delivery environment. While Tim Berners-Lee’s original vision included the ability to author Web-based documents directly, in practice this has not happened, for a number of reasons. XML was specifically designed to be “SGML for the Web”. A side effect of this design choice is that XML limits itself to Web-specific technology for a number of core infrastructure facilities, including addressing, which uses URIs (now IRIs) to address resources on the Web. The XIndirect [XInd] facility is designed to add to the XML toolbox those features needed by hyperdocument authoring support systems, without adding any complicating features to existing linking and addressing facilities such as XLink [XLink], XPointer [XPointer], and XInclude [XIncl].

Hyperdocument management is, first and foremost, the task of managing pointers and addresses. Thus, the way in which your documents do addressing largely determines the ability to perform specific management tasks.

Direct and indirect addressing

The XML family of recommendations extends URIs with XML-specific addressing facilities, such as XPointer. These provide additional addressing functionality for pointing to specific XML document components, but they explicitly do not provide any form of indirect addressing. This is a considered a design choice driven both by the desire to keep the overall system as simple as possible and in recognition of the fact that, for delivery, indirection is often counterproductive, adding implementation and processing complexity that is not needed. For delivery of a coordinated body of information, where the data to be delivered is essentially static, all the pointers can be hardened into direct pointers.

Indirect addressing is the act of using an intermediate object between the initial pointer (for example, a link) and the ultimate intended target. An everyday example of indirect addressing is Internet domain names: when you point to an address, such as “www.example.org”, you are really pointing to an entry in a DNS [Domain Name Service] table which maps the domain name to a specific numeric IP address representing a physical server on the network. This is a simple two-stage indirection that allows people and documents to utter more-or-less meaningful names instead of meaningless numeric strings. The cost of this indirection is that the DNS table entries must be maintained as the physical addresses of servers change and as new servers and domain names come on line. This cost is more than offset by the value of both meaningful names and eliminating the need for the documents that point to a given server to have to be updated every time the server’s numeric IP address changes.

Authoring and pointer management

Unlike delivery, where the data being delivered is essentially static, authoring involves documents that are both dynamic, in the sense that they are under constant revision, and mobile, in the sense that they may be moved to different locations as they are worked on, rather than being served from a single location.

One practical reality of document authoring is that authors cannot always work against a live server. If they could, then the content management server could handle all addressing details, just as Web servers do for Web sites. However, there are many scenarios in which editing against a live server is either not possible or not practical, such as working on a laptop on a plane, working from a location that only has intermittent dial-up access, or working on an overloaded corporate local area network.

This means that pointers to resources cannot assume the presence of a single server. It also means that the relative locations of the resources may vary depending on, for example, whether you have a local copy of a resource or you can get it from an active server.

Another issue with authoring is change management. The issues of change management and linking are covered in detail in [Heintz], but in short, the key problem is one of reacting to changes in the locations of link targets. If there are 100 direct pointers to a resource and the location of that resource changes, then all 100 pointers have to be changed. However, if the resource is pointed to indirectly, such that there are 100 pointers to a single indirection object that then points to the resource, when the resource’s location changes only one pointer has to change: the indirection object. In addition, the indirection object can be physically separate from any of the documents that point to it, so that it can be modified independently of any other documents in the system.

In short, direct pointers are, in general, unmanageable for non-trivial systems of interlinked documents authored by groups of authors. Because you cannot depend on the presence of live servers, you cannot do things like use server-side URL redirection to provide indirection, something you can do for Web-delivery of resources.

Because management of hyperdocument authoring requires indirection, it means that linking mechanisms like XLink and XPointer are, by themselves, not suitable for authoring, simply because they do not, themselves, provide any indirection facility. This leaves the developers of authoring support systems only a couple of choices for satisfying the indirection requirement. If they want to be as standards-based as possible they can use the ISO/IEC HyTime standard [HyTime], which provides all the indirection you might need and can be used with XML documents; or they can develop a purpose-built addressing mechanism. HyTime has the disadvantage that there is little off-the-shelf support for it, and it is a complex mechanism that can be difficult to grasp.1 Purpose-built solutions have the problem that you have to invent the solution and, regardless what you do, it won’t be a standard (unless you can turn your purpose-built solution into a standard). In both cases, implementors are pretty much on the hook for implementing the pointer processing and resolution.

If the delivery form of the information uses XLink (or even just normal HTML anchor links), all authoring solutions will require that the pointers be transformed into direct URIs, so that aspect of the problem is the same, regardless of what approach is used for authoring.

A word about HyTime

I was, and still am, one of the authors of the HyTime standard. The HyTime standard, among other things, attempts to provide a complete solution for the SGML and XML representation of pointers of various kinds. It was developed to satisfy the requirements of some of the most complex information management use cases we knew of. It was developed in a culture that valued and expected monolithic, all-encompassing specifications that would be implemented by large enterprises to solve large problems. It was developed at a time before the full implications of the Web and Internet time were fully understood, where we assumed most implementations would be by vendors developing products to sell to big enterprises. Unfortunately, the world changed while we were developing the HyTime specification. HyTime 2 was published in 1997, the same year that XML was completed.2

The HyTime specification is a large document, over 500 pages, of which about 200 relate directly to the linking and addressing features.3 Because the HyTime standard could not make assumptions about the world, nor could it impose any constraints on SGML, it had to be quite abstract in order to provide ways to fully define the use of the various facilities. Thus, there are a lot of definitional underpinnings to the relatively simple linking features themselves. Unfortunately, it is difficult to understand how the linking facilities work without first gaining some understanding of the total framework.

Thus, even though the actual syntax needed to represent simple indirect addresses in HyTime is pretty simple and not hard to implement, it’s quite a challenge to get to the point where you can see the tree for the forest. This is painfully clear to me now.

Thus, while I’m quite proud of the technical achievement of HyTime — I think there are a number of really solid ideas and principles embodied in it, and I think the actual markup design is quite well suited to the requirements it aimed to satisfy — I am also cognizant of the limitations in the specification, both in the overall complexity of the abstract system it represents and the daunting nature of the specification itself.

By contrast, the XIndirect facility attempts to define the simplest possible mechanism that will satisfy the pointer management requirements of XML-based authoring support systems. My experience with HyTime is that, at the markup level, you don’t need all that much. The basic idea is simple. What complicates things are details of syntax and trying to cover too broad a scope of use cases or requirements.

Where HyTime had to define all of its supporting foundation itself, the XML family of specifications now provides that for us. For almost everything that is in HyTime, there is a reasonable XML analog, from XPointer for addressing, to SMIL [SMIL] for multimedia rendition.4 This means that a facility, such as XIndirect, can focus entirely on the syntax and processing semantics without complicating itself with infrastructure basics. While I might quibble that some of the XML infrastructure components are either underspecified or overly complicated, in practice it matters not because you can usually see a clear path to a robust implementation, even if it’s not necessarily in terms of a mathematically complete abstraction.

The XIndirect facility

The XIndirect specification defines both an abstract data model and an XML-based representation syntax for indirectors. The defined representation syntax uses XPointers for addressing.5

The XIndirect facility as defined is as simple as possible. It has exactly one purpose, to enable the XML representation of multi-step addresses in a way that allows them to be used with other pointing and linking facilities without any need to change those facilities. In particular, it is designed to unilaterally extend XLink and XInclude by providing a form of indirect address that those facilities do not themselves provide. One important implication of this approach is that systems that support XLink or XInclude do not need to step up to offer support for indirection simply to be conforming. Rather, support for indirection can be added to an XLink or XInclude processor without affecting the base processing. In practice, it is simply a matter of calling a different function to resolve XPointers. The result is the same for the link or the inclusion being processed: a flat list of the resources that are the ultimate targets.

This section serves as an annotated version of the XIndirect specification.

XIndirect abstract model

This section describes the abstract XIndirect facility, which defines the abstract data and processing model that governs the interpretation of the representation syntax.

In order to define the XIndirect processing and data model, it is necessary to first define the processing and data model for links that use direct pointers, as typified by the XLink specification. In this model there are “Linkers”, the things that represent semantic links to resources; “Pointers”, the things that provide the addressing “plumbing” needed to actually get the resources; and “Resources”, the things pointed to. These types are all defined formally and in more detail in the following sections. The XIndirect facility introduces a new data type, “Indirector”, which is like a Linker in that it points to resources, but unlike a Linker, it has no semantic other than indirection. The formal models also make it clear that Indirectors are a subtype of Resource, which means that it is meaningful in some cases to address Indirectors as resources, not as indirections. The XML syntax representation for XIndirect specifically provides for this case by letting a Linker or Indirector indicate how any Indirectors it points to directly should be treated: as Indirectors or as Resources.

The processing model section defines the basic semantics of the four types: Linker, Pointer, Resource, and Indirector. The data model section then formally establishes the relationships among those types.

Processing model

The XLink specification defines essentially three kinds of things: “linkers” (a term coined for use in this document), resources, and pointers to resources. A “linker” is a thing that utters a pointer6 in order to address zero or more resources. The term “linker” is used here because the more correct term “pointer”, in the sense of “the thing doing the pointing,” conflicts with the term “pointer” in the sense of “an object that can be interpreted as the address of another object”. The use of the term “linker” here does not limit the applicability of the indirection facility to only those things that are semantically hyperlinks as defined by specifications such as XLink, HTML [HTML], and HyTime. In fact, a Linker, as defined here, may be a subcomponent of a complete hyperlink (e.g., the XLink locator element is, abstractly, an XIndirect Linker). The XIndirect facility is intended to be used, and is usable with, any application of pointers regardless of the semantic of the thing doing the pointing.

Some definitions: A “resource” is a thing that can be addressed and, therefore, meaningfully linked to. A pointer is a construct that, when interpreted in terms of some data model, “returns” zero or more resources. In XLink, because it does not define a semantic for indirection, the resources pointed to by the pointer uttered by the linker are, by definition, the resources intended by the creator of the linker as the target of the link.

To represent indirection, a fourth kind of thing is needed: indirectors. An indirector is simply a resource that itself specifies a pointer. When addressed, the indirector is interpreted not as a resource but as pointer to zero or more resources. By definition, indirectors have no semantic other than indirection.

The semantic of an indirector is that, by default, when an indirector is addressed, the indirector’s pointer is resolved and the result of that address is returned as the result of the first address. This means that, under normal circumstances, the use of indirection is transparent to the thing doing the initial pointing. The result of pointing to an indirector is a “compound address” composed of two or more steps.

An indirector is explicitly not a linker in that an indirector, by definition, has no stronger semantic than “Don’t look at me, look over there”. In particular, the use or non-use of indirectors cannot change the meaning of the linker that uses the address. That is, the hyperdocument constructed using indirect addresses must be semantically identical to the same hyperdocument constructed using direct addresses such that any or all indirect addresses can be replaced with the equivalent direct addresses without affecting the semantic interpretation of the hyperdocument in any way.

Note that there may be processing circumstances when indirectors need to be treated as resources in their own right. Thus, it must be possible to say, for a given pointer, whether or not any indirectors it addresses are to be treated as indirectors or as resources.

In addition, there may be processing circumstances where the set of indirectors that make up a compound address need to be made visible. Thus, indirection-aware processors should provide facilities for inspecting the indirectors that make up a given compound address.

Data model

The data model for XIndirect consists of four types: Linker, Pointer, Resource, and Indirector.

The following data model diagrams use the UML [UML] graphical syntax for static data models.

Linker, pointer, and resource data types

The Linker data type represents an object that ultimately references zero or more member resources, using a Pointer, as shown in Figure 1. A Resource is anything that can be addressed. A Pointer is a construct that can be interpreted as an address (e.g., an href attribute).

Figure 1: Linker data type
[Link to open this graphic in a separate page]

This diagram simply establishes the three types Linker, Resource, and Pointer, and indicates that Linkers are associated with some number of resources (here labeled “members”) through the use of Pointers. A given linker can have one or more Pointers. Each Pointer can point to zero or more resources. The annotation “<<derived>>” on the Linker-to-Resource relationship indicates that it is the effect of the Linker-to-Pointer-to-Resource relationship.

The purpose of this diagram is to clearly establish the distinction between the Pointers, which are plumbing, and the result of resolving the Pointers, which is the association of some set of Resources with the Linker. This model then allows us to introduce the Indirector type and show that the introduction of Indirectors only changes the plumbing — it does not change the ultimate relationship between the Linker and its associated target resources.

Indirector data type

The Indirector data type represents an object that serves only to point to another set of resources in order to establish an indirection, as shown in Figure 2.

Because Indirectors are themselves resources, the direct referents of an Indirector may be any combination of non-Indirector or Indirector resources.

Figure 2: Indirector data type
[Link to open this graphic in a separate page]

The picture above is indicating that an Indirector is both a subtype of Resource (meaning that it can be addressed by a Pointer) and that an Indirector can be associated with one or more Pointers to address the Indirector’s direct referents. The direct referents are those resources the Indirector points to directly, as opposed to those it might point to indirectly.

Because Indirectors are also Resources, this picture implies the ability of one Indirector to point to another Indirector, creating a multi-step indirect address.

Thus, the Resources ultimately addressed by a Linker when Indirectors are present includes all the non-Indirector Resources that are addressed anywhere along the multi-step address. Because the “members” role of the Linker-to-Resource relationship is a simple list, it indicates that the result of resolving any direct or indirect address will ultimately be a flat list of Resources.

Thus, the use or non-use of Indirectors does not disturb the final result as seen by the Linker: a flat list of resources. The details of how the Pointers are resolved is of no interest to the Linker, reinforcing the fact that pointing is just plumbing. It should also be clear from this that the details of the addressing has no effect on the semantics of Linker and its relationship to its member resources.

Location paths

Figure 3 is an UML instance diagram showing a typical system of indirect addresses, starting from a Linker with a single Pointer. (In this diagram, the Pointer instances have been omitted but are implied by the associations labeled “direct_referent”.)

A non-indirect resource is a resource that is not interpreted as an Indirector. A location path is a non-indirect pointer followed by a sequence of one or more Indirectors, terminating in set of zero or more non-indirect resources.

Figure 3: Location path instance
[Link to open this graphic in a separate page]

A given Indirector may point to another Indirector. When it does so, it forms a multi-step location path. Each Indirector forms a single step in the location path. The path extends from the initial reference by the Linker to the non-indirect Resources addressed by the last Indirector in the path. If a Linker or Indirector addresses multiple Indirectors, it creates a set of location paths, one for each terminal Indirector.

In Figure 3, there are two location paths rooted at Linker linker1. The first path consists of Indirector indir1 and Indirector indir2, terminating with Resource res1. The second path consists of Indirector indir1 and Indirector indir3, terminating with Resource res2.

From the point of view of linker1, the effective set of resources addressed is res1 and res2.

It is an error for an Indirector to occur twice in the same location path (which would create a cycle). It is not an error for the same Indirector to occur twice in a commonly-rooted set of location paths as long as it occurs at most once in any single location path within the set.

One implication of this picture, as well as from the data model in Figure 2, is that the processing required to resolve a location path could be quite involved, requiring a potentially unbounded set of recursions through “resolve Pointer” actions, cycle detection, and list construction. A practical implementation must provide time-out mechanisms to prevent overly-long resolutions from blocking a system, and so on. However, none of these implementation issues are intractable or particularly difficult to address from an engineering standpoint, especially within a closed system of reasonably narrow scope, such as a content management system that supports a technical documentation authoring community within an enterprise.

XIndirect representation

This section describes the XML-based representation syntax for Indirectors in XML documents. It is designed to be as simple as possible. It also uses naming conventions already in wide use in other XML specifications, in particular, the use of “href” as the name of the pointer attribute.

NOTE:

The element type declarations used below should be interpreted as “meta” element type declarations in that they only apply to elements within the XIndirect name space. They in no way constrain either where XIndirect elements may appear in non-XIndirect elements or what other elements may appear within XIndirect elements. For example, the declared meta content model of the indirector element is “EMPTY”, meaning that it cannot contain any other XIndirect-defined elements. However, it does not constrain the presence of any other content. That is, an XIndirect processor will ignore, for the purpose of applying XIndirect semantics, any content of indirector elements. However, such content may be used for other purposes.

The indirector element

The indirector element represents an Indirector object. The required href attribute specifies the indirector’s pointer. Indirector elements may occur anywhere, including as the document element of documents consisting of a single indirector element. There is no significance with respect to this specification of the context in which indirector elements occur.

NOTE:

Nesting of indirector elements is disallowed in order to avoid potential confusion about the significance of such nesting and to provide for future refinements in which such nesting would have specific implications.

The indirector element may have non-XIndirect content, for example, an indication of the local purpose of the indirector, application-specific metadata, etc. Any such content is ignored by XIndirect processors for the purposes of interpreting the indirector element as an Indirector.

Attributes of the indirector element:

href

Specifies the Indirector’s pointer. Currently the only recognized addressing syntax is XPointer.

id

The unique identifier of the indirector element within the scope of the XML document that contains it.

The indirector element may also take the indirector treatment attributes.

<!ELEMENT indirector
   EMPTY
>
<!ATTRIBUTE indirector
   href
     CDATA
     #REQUIRED
   id
     ID
     #IMPLIED
   %indirector-treatment-atts;
>

Indirector treatment attributes

Because Indirectors are also resources it is sometimes desirable or necessary to treat Indirectors as resources instead of as indirections. The indirector-treatment= and max-hops= attributes allow pointers to indicate how any addressed Indirectors are to be treated. These attributes may be specified on non-XIndirect Linker elements to allow them to directly address Indirectors as resources. Linker elements can also address Indirectors as resources indirectly by pointing to Indirectors that then point to the intended target Indirectors as resources.

indirector-treatment

Indicates whether or not any indirectors directly addressed by the Linker or Indirector are to be treated as resources or as Indirectors. The possible values are “as-indirector” or “as-resource”. When the value is “as-resource”, the indirector is not interpreted as an indirector and is instead returned directly. If a Linker has multiple pointers, the indirector-treatment= attribute applies to all of them. If there is a need to have different indirector treatment behavior for different pointers for a single Linker, an intermediate level of indirection must be used.

max-hops

Specifies the maximum number of indirections to be processed from a Linker or Indirector in the construction of a single location path. A value of zero (“0”) indicates that there is no defined limit (although XIndirect processors may impose their own limits). A value of one (“1”) indicates that, at most, one level of indirection should be processed. If the resource addressed by the last Indirector allowed by applying the max-hops value is itself an Indirector treated as an Indirector, then an empty node list is returned for that location path.

In the context of resolving a given location path, the first non-zero max-hops= value encountered in a location path governs ultimate resolution, such that subsequent non-zero max-hops= values are ignored for that resolution instance.

<!ENTITY % indirector-treatment-atts
 'indirector-treatment
     (as-indirector |
      as-resource)
     "as-indirector"
   max-hops
     CDATA
     "0"
'>

The indirectorset element

The indirectorset element contains zero or more indirector elements. It is provided as a convenience for grouping sets of indirectors together. It has no defined semantics other than containment. The use of indirectorset to contain indirector elements does not affect the interpretation of the indirectors in any way. The indirectorset element may be used as a document element or as a subelement within a larger document. The indirectorset element may contain any non-XIndirect content.

<!ELEMENT indirectorset
  (indirector*)
>

XIndirect implementation considerations

An XIndirect processor could be a direct extension of an existing XPointer implementation, or it could be a separate component that exposes a “resolvePointer()” method and returns lists of resources in whatever way the calling application expects them, such as a node list of DOM nodes.

An XIndirect processor could, optionally, provide services for inspecting the intermediate steps of location paths.

A content management system that relies on the use of indirectors could further extend the indirection processing, for example, to make it easy to swap in different sets of indirections for a given hyperdocument or implement application-specific URIs or pointer syntaxes.

XIndirect reference implementation

The following XSLT stylesheet provides a test implementation of the XIndirect facility. It uses XSLT, extended using the EXSLT[EXSLT] function mechanism, as well as the Saxon-defined “evaluate()” function, to resolve indirectors and produce a “debug” report showing both the ultimate results of each non-indirect pointer and the location paths represented by the indirector elements in the test document set. A set of simple test documents is also provided.

Overview of the XIndirect implementation

Most of the following stylesheet implements basic XPointer processing,7 which is plumbing that does not affect the way that indirectors are implemented. The functions that implement the indirection processing are the xindrf:resolve-xpointer() and xindrf:resolve-indirectors() functions.

The xindrf:resolve-xpointer() resolves URIs to a set of direct referents. In a non-indirection-aware process, this would be sufficient, and the resulting node set would simply be returned as the final set of resources addressed by the initial reference. However, in a non-indirection-aware process, any indirectors in the direct referents must be resolved to their ultimate targets. Thus, for each node in the direct result set, xindrf:resolve-xpointer() calls xindrf:resolve-indirectors(), passing in the direct referent node and returning the node set returned by xindrf:resolve-indirectors().

The xindrf:resolve-indirectors() function evaluates each node in the input node list. If it is an indirector and the indirector treatment has not been set to “as-resource”, then xindrf:resolve-indirectors() calls xindrf:resolve-xpointer() on that node’s href= value, returning the result node list. If the input node is not an indirector, it is simply returned.

Note that the recursion is indirect in that the processing alternates between xindrf:resolve-xpointer() and xindrf:resolve-indirectors(). One implication of this is that the only difference between the indirect-aware system and a non-indirect-aware system is the call to xindrf:resolve-indirectors(). Everything else is the same. In particular, the XPointer processing itself is the same in both cases. In addition, the API for initial resolution of XPointer URIs is essentially the same: resolve-xpointer-url(String URI). The only other change would be adding optional parameters for specifying the indirector treatment and max-hops values.

This simple reference implementation does not implement the max-hops= attribute nor does it do cycle detection or provide a time-out mechanism. All of these features would be required in a production system in order to ensure that all resolution attempts completed or failed within a reasonable length of time.

XIndirect test stylesheet

This stylesheet processes either of the documents shown in Subsection “XIndirect test documents” to generate a debug report. Its output is an HTML document. It requires the use of an XSLT 1.0 implementation that implements the “functions” and “common” modules of EXSLT (e.g., Saxon 6.5). It uses a partial but fairly complete implementation of XPointer to do pointer resolution.

<?xml version='1.0'?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xlink="http://www.w3.org/TR/xlink"
  xmlns:xindr="http://www.isogen.com/papers/xindirection.xml"
  xmlns:xindrf="http://www.isogen.com/functions/xindirection"
  xmlns:saxon="http://icl.com/saxon"
  xmlns:func="http://exslt.org/functions"
  xmlns:fcommon="http://exslt.org/common"
  extension-element-prefixes="func xindrf"

>
<!--

     This stylesheet tests the XIndirect reference implementation in
     xindirect-functions.xsl. It is designed to be run against the
     simple test documents testdoc-01.xml and testdoc-02.xml provided
     with this stylesheet. It produces a "debug" rendering of the
     hyperdocuments showing both the ultimate results of resolving
     indirect addresses as well as all intermediate results.

     Author: W. Eliot Kimber, eliot@isogen.com

  -->

<!-- XIndirection function definitions -->

<func:function name="xindrf:resolve-xpointer-url">
  <!-- Given an element that exhibits an href attribute,
       attempts to resolve the URL and XPointer (if present)
       into a result node list.

       If there is no fragment identifier, acts as though
       the fragment identifier "#/" had been specified,
       returning the document root.
    -->
  <xsl:param name="pointer-node"/><!-- The Element node that exhibits the 
  XPointer to be resolved -->
  <xsl:param name="indirector-treatment">as-indirector</xsl:param>
  <xsl:variable name="indirector-treatment-str" 
                   select="string($indirector-treatment)"/>
  <xsl:variable name="href" select="$pointer-node/@href"/>
  <xsl:choose>
    <xsl:when test="starts-with($href,'#')">
      <xsl:variable name="fragid">
        <xsl:value-of select="substring($href, 2)"/>
      </xsl:variable>
      <xsl:variable name="xpointer" select="xindrf:fragid2xpointer($fragid)"/>
      <!-- NOTE: error checking and reporting is done by resolve-xpointer -->
      <xsl:variable name="rns" 
                       select="xindrf:resolve-xpointer($pointer-node, $xpointer, 
                       $indirector-treatment-str)"/>
      <xsl:choose>
        <xsl:when test="string($rns) = ''">
          <func:result select="/.."/>
        </xsl:when>
        <xsl:when test="fcommon:object-type($rns) != 'node-set'">
          <func:result select="/.."/>
        </xsl:when>
        <xsl:otherwise>
          <func:result select="$rns"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:when>
    <xsl:otherwise>
      <xsl:variable name="url">
        <xsl:variable name="cand-url" 
                         select="substring-before($pointer-node/@href, '#')"/>
        <xsl:choose>
          <xsl:when test="$cand-url = ''">
            <xsl:value-of select="$href"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="$cand-url"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:variable>
      <xsl:variable name="cand-xpointer">
        <xsl:value-of select="substring-after($href, '#')"/>
      </xsl:variable>
      <xsl:variable name="xpointer">
        <xsl:choose>
          <xsl:when test="$cand-xpointer = ''">
            <xsl:value-of select="string('/')"/><!-- Return the document element of the 
            target document -->
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="xindrf:fragid2xpointer($cand-xpointer)"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:variable>
      <xsl:variable name="location-source-node" 
                       select="document($url, $pointer-node)"/>
      <xsl:variable name="rns" 
                       select="xindrf:resolve-xpointer($location-source-node, $xpointer,
      $indirector-treatment-str)"/>
      <xsl:choose>
        <xsl:when test="string($rns) = ''">
          <func:result select="/.."/>
        </xsl:when>
        <xsl:when test="fcommon:object-type($rns) != 'node-set'">
          <func:result select="/.."/>
        </xsl:when>
        <xsl:otherwise>
          <func:result select="$rns"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:otherwise>
  </xsl:choose>

</func:function>

<func:function name="xindrf:resolve-xpointer">
  <!-- Resolves an xpointer in the context of some location source node.

       The location source is either the pointer, if the URL was just
       an XPointer, or it's the document element of the document addressed by
       the resource part of the URL.
    -->
  <xsl:param name="location-source-node"/>
  <xsl:param name="xpointer"/>
  <xsl:param name="indirector-treatment-str"/>
  <xsl:for-each select="$location-source-node">
    <!-- Setting the context to the pointer node so that relative URLs are resolved
         relative to the pointer node by saxon:evaluate() -->
    <xsl:choose>
      <xsl:when test="$xpointer != ''">
        <xsl:variable name="direct-result-set" select="saxon:evaluate($xpointer)"/>
        <xsl:choose>
          <xsl:when test="string($direct-result-set) = ''">
            <xsl:message>XIndirect warning: XPointer "<xsl:value-of 
            select="$xpointer"/>" did not address any nodes.</xsl:message>
          </xsl:when>
          <xsl:when test="fcommon:object-type($direct-result-set) != 'node-set'">
            <xsl:message>XIndirect warning: XPointer "<xsl:value-of 
            select="$xpointer"/>" did not address any nodes.</xsl:message>
            <func:result select="/.."/>
          </xsl:when>
          <xsl:otherwise>
            <func:result select="xindrf:resolve-indirectors($direct-result-set, 
                                    $indirector-treatment-str)"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>
      <xsl:otherwise>
        <xsl:message>XIndirect error: $xpointer value is '' in 
        resolve-xpointer.</xsl:message>
        <func:result select="/.."/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each>

</func:function>

<func:function name="xindrf:resolve-indirectors">
  <!-- Given a node set of potential pointers, recurses through the list,
       resolving any indirections.

       The key to this function is the use of the node-set-union operator
       (|) to recursively construct the result node list.
    -->
  <xsl:param name="pointer-node-set" select="/.."/>
  <xsl:param name="indirector-treatment">as-indirector</xsl:param>
  <xsl:variable name="indirector-treatment-str" 
                   select="string($indirector-treatment)"/>
  <xsl:choose>
    <xsl:when test="$pointer-node-set">
      <xsl:variable name="car" select="$pointer-node-set[1]"/>
      <xsl:variable name="cdr" select="$pointer-node-set[position() > 1]"/>
      <xsl:choose>
        <xsl:when test="$car[self::xindr:indirector] and
                        ($indirector-treatment-str != 'as-resource')">
          <xsl:variable name="rns"
                           select="xindrf:resolve-xpointer-url($car, 
                           $car/@indirector-treatment) | 
                           xindrf:resolve-indirectors($cdr, 
                           $indirector-treatment-str)"/>
          <func:result select="$rns"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:variable name="rns"
              select="$car | 
                      xindrf:resolve-indirectors($cdr, $indirector-treatment-str)"/>
          <func:result select="$rns"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:when>
    <xsl:otherwise>
      <func:result select="/.."/>
    </xsl:otherwise>
  </xsl:choose>
</func:function>

<func:function name="xindrf:fragid2xpointer">
  <xsl:param name="fragid"/>
  <!-- Given a fragment identifier string, attempts to interpret it as an 
  XPointer. -->
  <!-- NOTE: does not:
       - Handle multi-part XPointers: "#xpointer(foo)xpointer(bar)"
       - Skip non-xpointer schemes

       Doing this would require more sophisticated string processing than I 
       can reasonably do in XSLT.
    -->
  <xsl:choose>
    <xsl:when test="starts-with($fragid, 'xpointer(')">
      <xsl:variable name="first-part" 
                       select="substring-after($fragid, 'xpointer(')"/>
      <xsl:variable name="len" select="(string-length($first-part) - 1)"/>
      <xsl:variable name="xpointer" select="substring($first-part,1,$len)"/>
      <func:result select="$xpointer"/>
    </xsl:when>
    <xsl:when test="not(contains($fragid, '/')) and
                    not(contains($fragid, '[')) and
                    not(contains($fragid, '*')) and
                    not(contains($fragid, '@'))">
      <!-- Probably a bare name -->
      <func:result select="concat('id(', $fragid, ')')"/>
    </xsl:when>
    <xsl:when test="contains($fragid, '/') and
                    not(contains($fragid, '[')) and
                    not(contains($fragid, '*')) and
                    not(contains($fragid, '@'))">
      <!-- Probably a child sequence -->
      <xsl:variable name="barename" select="substring-before($fragid, '/')"/>
      <xsl:choose>
        <xsl:when test="$barename = '' and
                        contains(translate($fragid,
                                     'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
                                    '^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^'),
                                 '^')">
          <func:result select="xindrf:xpointer-error($fragid)"/>
        </xsl:when>
        <xsl:when test="$barename = ''">
          <xsl:message>fragid='<xsl:value-of select="$fragid"/>'</xsl:message>
          <xsl:variable name="childseq"
                           select="xindrf:build-child-sequence($fragid)"/>
          <func:result select="$childseq"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:variable name="idref" select="concat('id(', $barename, ')')"/>
          <xsl:variable name="xpointer-childseq"
                           select="substring($fragid, 
                           (string-length($barename) + 1))"/>
          <xsl:choose>
            <xsl:when test="contains(translate($xpointer-childseq,
                                     'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
                                    '^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^'),
                                     '^')">
             <func:result select="xindrf:xpointer-error($fragid)"/>
            </xsl:when>
            <xsl:otherwise>
              <func:result select="concat('id(', $barename, ')',
                                    xindrf:build-child-sequence($xpointer-childseq))"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:when>
    <xsl:otherwise>
      <func:result select="xindrf:xpointer-error($fragid)"/>
    </xsl:otherwise>
  </xsl:choose>
</func:function>

<func:function name="xindrf:build-child-sequence">
  <xsl:param name="xptr-childseq"/>
  <xsl:choose>
    <xsl:when test="not(starts-with($xptr-childseq, '/'))">
      <func:result select="xindrf:xpointer-error($xptr-childseq)"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:variable name="temp" 
                       select="substring($xptr-childseq, 2)"/>
                       <!-- strip leading "/" -->
      <func:result select="xindrf:construct-child-sequence($temp)"/>
    </xsl:otherwise>
  </xsl:choose>
</func:function>

<func:function name="xindrf:construct-child-sequence">
  <xsl:param name="xptr-child-seq"/>
  <xsl:param name="xpath-child-seq"/>
  <xsl:variable name="child-num">
    <xsl:choose>
      <xsl:when test="contains($xptr-child-seq, '/')">
        <xsl:value-of select="concat('/*[', 
                                 substring-before($xptr-child-seq, '/'), ']')"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$xptr-child-seq"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
  <xsl:variable name="rest" select="substring-after($xptr-child-seq, '/')"/>
  <xsl:choose>
    <xsl:when test="$rest = ''">
      <func:result select="$xpath-child-seq"/>
    </xsl:when>
    <xsl:otherwise>
      <func:result select="xindrf:construct-child-sequence($rest, 
                              concat($xpath-child-seq, $child-num))"/>
    </xsl:otherwise>
  </xsl:choose>
</func:function>

<func:function name="xindrf:xpointer-error">
  <!-- Reports an XPointer error and returns "/.." -->
  <xsl:param name="fragid"/>
  <xsl:message
>XPointer error: fragment identifier "<xsl:value-of select="$fragid"/>" is not 
a valid XPointer.
              Returning "/.." as XPath to resolve (empty node set)</xsl:message>
  <func:result select="concat('/', '..')"/>
</func:function>
<xsl:output
  method="xml"
  indent="no"
  omit-xml-declaration="no"
  encoding="UTF-8"
/>

<!-- End of XIndirect function definitions -->

<xsl:template match="/">
  <html>
    <body>
      <div>
      <hr/>
      <h2>Input Document</h2>
      <pre>
      <xsl:apply-templates mode="echo-markup"/>
      </pre>
      </div>
      <div>
      <hr/>
      <h2>Debug Report</h2>
      <xsl:apply-templates select="//links"/>
      <xsl:apply-templates select="//xindr:indirectorset"/>
      <xsl:apply-templates select="//paras"/>
      </div>
    </body>
  </html>
</xsl:template>

<xsl:template match="links">
  <div>
    <h2>Links</h2>
    <table border="1" width="100%">
      <xsl:apply-templates/>
    </table>
  </div>
</xsl:template>

<xsl:template match="xlink:simple">
  <tr>
    <td><a name="{generate-id()}"
      /><xsl:text>[</xsl:text
      ><xsl:value-of select="generate-id()"
      /><xsl:text>] Initial pointer: </xsl:text>
      <xsl:value-of select="@href"/><xsl:text>: </xsl:text>
      <xsl:apply-templates/>
      <br/><xsl:text>Direct targets: </xsl:text>
      <xsl:apply-templates
          select="xindrf:resolve-xpointer-url(., 'as-resource')"
          mode="generate-link-to"/>
    </td>
    <td>
      <xsl:text>Ultimate targets: </xsl:text>
      <xsl:variable name="members" select="xindrf:resolve-xpointer-url(.)"/>
      <xsl:for-each select="$members">
        <br/>
        <a href="#{generate-id()}"
        ><xsl:value-of select="generate-id()"
        /></a>: <code><xsl:apply-templates select="." mode="echo-markup"/></code>
      </xsl:for-each>
    </td>
  </tr>
</xsl:template>

<xsl:template match="paras">
  <div>
    <h2>Paragraphs</h2>
    <xsl:apply-templates/>
  </div>
</xsl:template>

<xsl:template match="para">
  <p
    ><a name="{generate-id()}"
    /><xsl:text>[</xsl:text
    ><xsl:value-of select="generate-id()"
    /><xsl:text
    >]</xsl:text
  ><xsl:apply-templates
  /></p>
</xsl:template>

<xsl:template match="xindr:indirectorset">
  <div>
    <h2>Indirectors</h2>
    <table border="1" width="100%">
      <xsl:apply-templates/>
    </table>
  </div>
</xsl:template>

<xsl:template match="xindr:indirector">
  <xsl:variable name="direct-target" 
  select="xindrf:resolve-xpointer-url(., 'as-resource')"/>
  <tr>
    <td><a name="{generate-id()}"
      /><xsl:text>[</xsl:text
      ><xsl:value-of select="generate-id()"
      /><xsl:text>] Pointer: '</xsl:text
      ><xsl:value-of select="@href"
      /><xsl:text>'. </xsl:text
      ><br/><xsl:text>Comment: </xsl:text
      ><xsl:apply-templates/>
    </td>
    <td>
      <xsl:text>Direct targets: </xsl:text
      >
      <xsl:apply-templates select="$direct-target" mode="generate-link-to"/>
    </td>
  </tr>
</xsl:template>

<xsl:template match="*" mode="generate-link-to">
    <br/>
    <a href="#{generate-id()}"
    ><xsl:value-of select="generate-id()"
    /></a>: <code><xsl:apply-templates select="." mode="echo-markup"/></code>
</xsl:template>

<xsl:template name="echo-element-markup">
  <xsl:text><</xsl:text><xsl:value-of select="name()"/>
  <xsl:for-each select="./attribute::*">
    <xsl:text>  </xsl:text><xsl:value-of select="name()"/>=
    <xsl:value-of select="."/>
  </xsl:for-each>
  <xsl:text>></xsl:text>
  <xsl:apply-templates mode="echo-markup"/>
  <xsl:text><</xsl:text><xsl:value-of select="name()"/><xsl:text>></xsl:text>
</xsl:template>

<xsl:template match="*" mode="echo-markup">
  <xsl:call-template name="echo-element-markup"/>
</xsl:template>

</xsl:stylesheet>

XIndirect test documents

This test set consists of two documents, testdoc-01.xml and testdoc-02.xml. The testdoc-01.xml document is completely self-contained. It serves to demonstrate both different configurations of pointers and indirector, as well as failure and exception conditions. The testdoc-02.xml document demonstrates cross-document indirect links from itself to testdoc-01.xml.

These documents use XML IDs but do not use the id() function in order to avoid problems with DTD-unaware process (including processing of intermediary result trees within an XSLT transform. The //*[@id='foo'] pattern is reliable in all processing contexts as long as all ID-type attributes have the name “id”, which is a common convention and the convention used in these samples.

testdoc-01.xml

<?xml version="1.0"?>
<xindrtest
  xmlns:xindr="http://www.isogen.com/papers/xindirection.xml"
  xmlns:xlink="http://www.w3.org/TR/xlink"
>
<links>
<xlink:simple href="#xpointer(//*[@id='addr-01'])">indirect link to 
para 1</xlink:simple>
<xlink:simple href="#xpointer(//*[@id='addr-02'])">indirect link to 
para 2</xlink:simple>
<xlink:simple href="#xpointer(//*[@id='addr-03'])">double indirect link to 
para 2</xlink:simple>
<xlink:simple href="#xpointer(//*[@id='addr-04'])">2nd double indirect link 
to para 2</xlink:simple>
<xlink:simple href="#xpointer(//para[position() < 3])">direct link to both 
the paras</xlink:simple>
<xlink:simple href="#xmlns(xindr=http://www.isogen.com/papers/xindirection.xml)
    xpointer(//xindr:indirector[position() < 3])">link to paras 1 
    and 2</xlink:simple>
<xlink:simple href="#xmlns(xindr=http://www.isogen.com/papers/xindirection.xml)
    xpointer(//xindr:indirector[position() < 4])">link to paras 1, 2, 
    and 3</xlink:simple>
<xlink:simple href="#xpointer(//para/@foo)">direct link to foo attribute of 
para 1</xlink:simple>
<xlink:simple href="#xpointer(//para/@foo='bar')">invalid 
xpointer</xlink:simple>
<xlink:simple href="#an-id-value">invalid xpointer (bare name with no 
DTD)</xlink:simple>
<xlink:simple href="#//*[@id='addr-01']">Invalid URL (bare XPath as 
fragment ID)</xlink:simple>
<xlink:simple href="#/foo/bar/baz">Invalid XPointer (bare XPath)</xlink:simple>
</links>
<paras>
<para foo="bar">This is the first para</para>
<para>This is the second para</para>
<para>This is the third para</para>
</paras>
<xindr:indirectorset>
<xindr:indirector id="addr-01"
  href="#xpointer(/*/paras/para[1])">pointer to para 1</xindr:indirector>
<xindr:indirector id="addr-02"
  href="#xpointer(/*/paras/para[2])">pointer to para 2</xindr:indirector>
<xindr:indirector id="addr-05"
  href="#xpointer(/*/paras/para[3])">pointer to para 3</xindr:indirector>
<xindr:indirector id="addr-03"
  href="#xmlns(xindr=http://www.isogen.com/papers/xindirection.xml)
        xpointer(../xindr:indirector[2])">pointer to indirector 
        "addr-02"</xindr:indirector>
<xindr:indirector id="addr-04"
  href="#xmlns(xindr=http://www.isogen.com/papers/xindirection.xml)
         xpointer(//xindr:indirectorset/xindr:indirector[2])">2nd pointer to 
         indirector "addr-02"</xindr:indirector>
</xindr:indirectorset>
</xindrtest>

testdoc-02.xml

<?xml version="1.0"?>
<xindrtest
  xmlns:xindr="http://www.isogen.com/papers/xindirection.xml"
  xmlns:xlink="http://www.w3.org/TR/xlink"
>
<links>
<xlink:simple href="./testdoc-01.xml">direct link to doc element of 
doc 1</xlink:simple>
<xlink:simple href="#xpointer(//*[@id='addr-02'])">indirect link to doc 
element of doc 1</xlink:simple>
<xlink:simple href="./testdoc-01.xml#xpointer(//*[@id='addr-01'])">indirect 
link to para 1 in doc1</xlink:simple>
<xlink:simple href="#xpointer(//*[@id='addr-01'])">indirect link to para 1 
in doc1 through indirector in this doc.</xlink:simple>
</links>
<xindr:indirectorset>
<xindr:indirector id="addr-01"
  href="./testdoc-01.xml#xpointer(/*/paras/para[1])">pointer to para 1 in 
  doc1</xindr:indirector>
<xindr:indirector id="addr-02"
  href="./testdoc-01.xml">pointer to doc elem of doc 1</xindr:indirector>
</xindr:indirectorset>
</xindrtest>

Future directions

The next step we hope to undertake is the integration of XIndirect processing into an XML content management system in order to gain more practical experience using this simple-but-powerful indirection facility. This might include using XIndirect to implement the referent tracking documents approach to managing versioned hyperlinks, as defined in [RefTrack].

Another area for research is the support for dynamic application of specific indirectors based on resolution-time parameters, for example, to select different versions of a target resource.

Notes

1.

This difficulty can exist even though the part of HyTime that is needed for supporting the indirect addressing needed for authoring is quite a small fraction of the standard.

2.

While the XML specification was published as a recommendation in early 1998, it was essentially complete in late 1997 when we announced it at the XML 1997 conference.

3.

The rest of the specification discusses facilities, such as scheduling and rendition, additional facilities that effectively extended SGML, and appendixes with full DTD listings and the like.

4.

And there are a few things that we didn’t realize we needed, such as a standard API for accessing our fundamental data model.

5.

However, there is no particular magic to XPointers for this purpose. There are any number of other possible representation syntaxes. In particular, the indirect addressing features of HyTime (clause 7) represent a conforming alternative representation syntax for the XIndirect abstract data model.

6.

In the metaphorical sense, this utterance is providing a pointer, in some syntax, to a processor that will resolve the pointer, as though it were an order to perform a task, just as an executive might bark to his assistant “Get me Smithers on the phone”.

7.

This implementation reflects the pre-XPointer framework XPointer design. However, an updated implementation that supported the XPointer framework would be functionally equivalent to this one and would not change the nature of the XIndirect-specific features.


Bibliography

[Heintz] Heintz, John D. Versioned Hyperdocuments: Support for Lifecycle Models. In Proceedings of Extreme MarkupLanguages 2002. Available online at http://www.isogen.com/downloads/white_papers/white_papers.jsp.

[HTML] W3C. Hypertext Markup Language (HTML). Various versions developed and published by the W3C. See http://www.w3.org/MarkUp/.

[HyTime] International Organization for Standardization (ISO). ISO/IEC 10744:1997 Hypermedia/Time-Based Structuring Language (HyTime). Geneva, Switzerland. 1997. Available online at http://www.ornl.gov/sgml/wg8/docs/n1920/.

[RefTrack] Kimber, W. Eliot, Steve Newcomb, and Peter Newcomb. Version Management as Hypertext Application: Referent Tracking Documents. In Proceedings of Markup Technologies 1999. Chicago, Ill. Available online at http://www.isogen.com/papers/ref-track-docs-paper.pdf.

[SMIL] SMIL specification.

[UML] UML specification.

[XIncl] W3C. XML Inclusions (XInclude) Version 1.0. Candidate Recommendation. Published by the W3C, September 2002. See http://www.w3.org/TR/xinclude/.

[XInd] W3C. XML Indirection Facility Note. Published by the W3C, June 2003. See http://www.w3.org/TR/NOTE-XIndirect.

[XLink] W3C. XML Linking Language (XLink) Recommendation. Published by the W3C, 2000. See http://www.w3.org/TR/xlink/.

[XPath] W3C. XML Path Language (XPath) Version 1.0 Recommendation. Published by the W3C, November 1999. See http://www.w3.org/TR/xpath.

[XPointer] W3C. XPointer Framework, see http://www.w3.org/TR/xptr-framework/. XPointer element() Scheme, see http://www.w3.org/TR/xptr-element/. XPointer xmlns() Scheme, see http://www.w3.org/TR/xptr-xmlns/. XPointer xpointer() Scheme, see http://www.w3.org/TR/xptr-xmlns/. All published by the W3C, 2002.

[XSLT] W3C. XSL Transformations (XSLT) 1.0 Recommendation. Published by the W3C, November 1999. See http://www.w3.org/TR/xslt.



XIndirect

W. Eliot Kimber [ISOGEN International, LLC]
eliot@isogen.com