RxPath: a mapping of RDF to the XPath Data Model

Adam Souzis
asouzis@users.sf.net

Abstract

RxPath is a deterministic, comprehensive mapping from the RDF abstract syntax to the XPath data model; it allows XPath expressions (designed to access XML data) to access RDF nodes. RxPath allows the full range of XPath languages, including XSLT, XUpdate, Schematron, XForms, etc., to be used to access and present RDF data. RxPath is designed to be amenable to efficient processing and intuitive understanding, and to encourage the adoption of RDF. RxPath is compared with other approaches, including the use of XPath with XML serializations of RDF, XPath-like query languages, and the use of XPath/XSLT extensions.

Keywords: Mapping; RDF; Querying; XPath; XSLT

Adam Souzis

Adam Souzis is founder of the Rhizome project (www.liminalzone.org). Before that he was co-founder and CTO of content distribution software company Kinecta Corp. For the last decade Adam has been creating new internet technology for startups such as General Magic, NetObjects, and Stellent.

RxPath: a mapping of RDF to the XPath Data Model

Adam Souzis [Liminal Systems]

Extreme Markup Languages 2006® (Montréal, Québec)

Copyright © 2006 Adam Souzis. Reproduced with permission.

Introduction

There have been several attempts at using XPath and XSLT to access and present RDF data, such as defining a XPath-like query language or XPath extension functions for XSLT. Unlike other attempts, RxPath is defined by a deterministic mapping from the RDF abstract syntax to the XPath 1.0 data model. This approach avoids the limitations using XPath on XML serializations of RDF while allowing the full range of XPath-based languages to be used to query and manipulate RDF models -- such as XSLT for presentation and transformation, XUpdate for modification, Schematron for validation, and XForms for both presentation and modification -- without having to make any syntactic changes to those languages. An open source implementation and formal specification[RxPath] of RxPath is available.

Motivations and Requirements

The primary motivation for designing RxPath was to make RDF easier to use and learn, especially for developers familiar with XML, thus aiding RDF's wide-spread adoption. This led to the following design goals:

  • Provide RDF with the wide range of tools available for XML, reusing mature and widely understood XML languages. For instance, content in a RDF model can be extracted and formatted using a standard XSLT stylesheet, without relying on extension functions or elements. This stylesheet can be authored, tested and debugged with standard XSLT development tools.
  • Reduce the learning curve and conceptual impedance between RDF and XML by RxPath allowing a RDF model to be visualized much like a standard XML DOM.
  • Ease the migration and integration of RDF with existing XML-based web applications.
  • Provide a concise, easy to use query language that fits well with RDF's graph-based model.

RxPath was designed with the following requirements in mind:

  • MUST be syntactically identical to XPath 1.0.
  • MUST be able to represent any valid RDF graph.
  • All information from the RDF Model MUST be available in the RxPath Data Model.
  • Deterministic - there MUST be only one RxPath data model representation for a given a RDF model.
  • RxPath expressions SHOULD be amenable to static analysis. For example, it should be possible to determine for any step in an XPath expression its corresponding position in a RDF triple.
  • RxPath expressions SHOULD enable efficient processing. For example, the descendent axis should not imply an exhaustive search through the entire RDF model.
  • RxPath expressions SHOULD be conform closely to the user's expectations of how XPath expressions behave.
  • RxPath expressions SHOULD provide intuitive navigation of an RDF model visualized as a graph.

Brief description of RxPath

RDF (Resource Description Framework) provides an abstract model for describing information about anything that can be represented by a URI. At its core, the RDF model is very simple: it consists of a set of resources (an abstract notion in RDF), which have properties; each property has a value which is either a resource or a literal string. Both resources and property names are URIs. Thus an RDF model can be described as a set of sentences (or statements), where each statement could look like: <http://example.com#resourceA> <http://example.com#has-a> "a literal value". The positions within a sentence are referred to as its subject, predicate, and object, respectively. Alternatively, you can visualize an RDF model as a labeled directed graph where the graph's nodes are the resources and literals and the properties form the arcs between them.

Conceptually, RxPath can be seen as a mapping of this graph to the tree structure defined by the XPath data model. Though RxPath can not be implemented as materialized XML (indeed, the mapping can lead to infinitely deep trees), it can be visualized as operating on a XML document fragment (with multiple root elements) that looks very similar to the striped syntax of W3C's RDF/XML serialization. Consider this fragment of RDF/XML derived from example 4 in the RDF/XML Syntax Recommendation [RDF/XML]:

<ex:Document rdf:about="http://www.w3.org/TR/rdf-syntax-grammar"
   xmlns:ex='http://example.org/stuff/1.0'
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:dc="http://purl.org/dc/elements/1.1/" >
  <ex:editor>
    <rdf:Description nodeID="abc">
      <ex:homePage>
        <rdf:Description rdf:about="http://purl.org/net/dajobe/" />
      </ex:homePage>
      <ex:fullName>Dave Beckett</ex:fullName>
    </rdf:Description>
  </ex:editor>
  <dc:title xml:lang="en">RDF/XML Syntax Specification (Revised)</dc:title>
</ex:Document>

The RDF model described in this example could be written out as the following statements:

<http://www.w3.org/TR/rdf-syntax-grammar> <ex:editor> _:abc.
<http://www.w3.org/TR/rdf-syntax-grammar> <rdf:type> ex:Document.
<http://www.w3.org/TR/rdf-syntax-grammar> <dc:title>  "RDF/XML Syntax Specification (Revised)"@en.
_:abc <ex:homePage> <http://purl.org/net/dajobe>.
_:abc <ex:fullName> "Dave Beckett".
Note the "_:abc" which corresponds to the resource with the nodeID attribute. RDF allows resources with no URI identifier ("blank nodes"), but these nodes still need a private identifier to distinguish them. In addition to these statements, a RDF processor that support RDF Schema[RDF Schema] would be able to infer a few more statements about these resources, including:
<ex:Document> <rdf:type> <rdfs:Class>.
<http://purl.org/net/dajobe> <rdf:type> <rdfs:Resource>.
_:abc <rdf:type> <rdfs:Resource>.
<ex:editor> <rdf:type> <rdf:Property>. 
(etc.for each property...)

The next example shows an XML fragment that illustrates how these statements are represented in RxPath. The XML closely follows the RDF/XML pattern in the previous example, with some minor differences, namely, that we use a "bnode:" (pseudo) URL scheme for blank nodes to avoid the need for a nodeID attribute. As the example illustrates, every resource in the RDF graph appears as a top-level element and each resource element always enumerates all its properties regardless of whether the resource is in the subject or object position. Also note that resource and property elements are ordered by their corresponding URI reference.

<rdfs:Resource rdf:about="bnode:abc">
  <ex:homePage>
    <rdfs:Resource rdf:about="http://purl.org/net/dajobe/" />        
  </ex:homePage>
  <ex:fullName>Dave Beckett</ex:fullName>
  <rdf:type>
    <rdfs:Class rdf:about="http://www.w3.org/2000/01/rdf-schema#Resource" />
  </rdf:type>    
</rdfs:Resource>

<rdfs:Resource rdf:about="http://example.org/stuff/1.0/Document">
  <rdf:type>
    <rdfs:Class rdf:about="http://example.org/stuff/1.0/Document" />
  </rdf:type>    
</rdfs:Resource>

<rdfs:Resource rdf:about="http://purl.org/net/dajobe/" />
  <rdf:type>
    <rdfs:Class rdf:about="http://www.w3.org/2000/01/rdf-schema#Resource" />
  </rdf:type>    
</rdfs:Resource>

<ex:Document rdf:about="http://www.w3.org/TR/rdf-syntax-grammar">
  <dc:title xml:lang="en">RDF/XML Syntax Specification (Revised)</dc:title>
  <ex:editor>
    <rdfs:Resource rdf:about="bnode:abc">
     <ex:homePage>
      <rdfs:Resource rdf:about="http://purl.org/net/dajobe/"/>
     </ex:homePage>
     <ex:fullName>Dave Beckett</ex:fullName>
    </rdfs:Resource>
  </ex:editor>
  <rdf:type>
    <rdfs:Class rdf:about="http://example.org/stuff/1.0/Document" />
  </rdf:type>    
</ex:Document>

(Note: rdf:Property statements omitted for conciseness).

Applying standard XPath expressions to this XML fragment, we can see that:

  • /* selects all the resources in the model
  • /ex:Document selects all the resources of type "http://example.org/stuff/1.0/Document"
  • /*/* select all the property elements. This is equivalent to selecting all the statements in the model, as for each statement there will be one property element (whose parent is the subject element and whose only child is the object node).
  • /*/ex:editor/* selects all resources that are editors.
  • /*/ex:editor/*/ex:homePage/* selects all the home page resources of editors. (If the home page could be a literal, we could instead use /*/ex:editor/*/ex:homePage/text() to select only literals and /*/ex:editor/*/ex:homePage/node() to select both text and elements.)
  • /*[@rdf:about="http://www.w3.org/TR/rdf-syntax-grammar"] selects the resource identified by the given URI.

This examples illustrates the basic rules for mapping an RDF model to an RxPath data model:

  • The XPath "document" will have a root element (a "subject element") for every resource in the RDF model, with its element name equal to the QName abbreviation of the resource's class (the value of its rdf:type property -- note that there can be more than one, see below). It will have one attribute, "rdf:about", whose value is the URI reference of the resource.
  • Each of those root elements has a child element (a "predicate element") for each statement in the model that the resource is the subject of. The name of each child element is the QName abbreviation of the property URI reference in the statement. The predicate element will have an attribute named "uri" whose value is also the URI reference of the property (this attribute is omitted from the example above).1
  • Each of these predicate elements has exactly one child node (the "object node"): if the object of the statement is a literal, it will be a text node; if it is a resource, an element node. If the object is a literal, the predicate element may also have a rdf:datatype or a xml:lang attribute.
  • If the object is a resource, the element node will have the same name, attributes and children as the equivalent subject element. (Since the RDF model is a graph that can contain cycles, this can lead to an infinitely deep tree.)
  • Addition transformation rules are used to provide extra syntactic support for RDF collections and containers (see below).

RxPath Expression Semantics

RxPath expressions are evaluated exactly as if they were an XPath expressions being applied to an XML document fragment except for the modifications described below.

String Value

In RxPath, the string value of a Subject or Object element is its URI reference (the value of its rdf:about attribute). This allows for a more concise way to select an particular resource with given URI, e.g.: /*[.='http://www.w3.org/TR/rdf-syntax-grammar'] will select the resource in the previous example. Also, the root node's string value will be a URI that should represent the RDF model or its source document. With these exceptions, RxPath follows XPath for determining the string-value of a node. Note that a consequence of this rule is that the string-value of a Predicate element will be the string-value of its Object node -- either a literal or the object's URI reference. For example, /*/dc:title[.="RDF/XML Syntax Specification (Revised)"] to select a statement with the given property and object.

Name tests

In XPath, the qualified XML name (QNames) in an expression like html:body is called a name test and matches an element with the equivalent name. In the previous example we saw how with RxPath the name tests are used to match resources by class and properties by name. Although intuitively equivalent to an XPath name test given RxPath's RDF/XML-like serialization, we need specific semantics to support the full range of possibilities expressible with an RDF model. First, we need to define how to convert a QName's namespace URI and local name to the URI reference of a RDF property or class resource. Simple concatenation (the rule used by the RDF/XML specification) is not sufficient to represent all possible URIs as a QName because a URI ending in a character that is not a valid XML name character would require an empty local name of the QName. To handle this case RxPath uses this rule: QNames are converted to URI references by concatenating the QName's namespace name (the URI) with its local name, unless the local name is a sequence of one or more '_' characters; in this case the local name is substituted with a possibly empty string equal to the local name minus one '_'.

Given this mapping, a RxPath name tests matches a subject or object element when the RDF statement <element URI reference> rdf:type <NameTest URI reference>. holds true, where the NameTest URI reference is the URI obtained from the name test's QName (as described above). This allows resources with more than one type to properly match the name test. If the node is a predicate element, the test matches when the name matches a property URI reference that subsumes the property represented by the element's URI reference. For example, if an RxPath processor support RDF Schema, the match will occur when the statement <element URI reference> rdfs:subPropertyOf <NameTest URI reference>. holds true.

Descendant and ancestor axes

RxPath constrains the semantics of XPath's descendant and ancestor axes 2 to enable them to behave like a transitive join. For example, we can use /*[.='http://example.org/great-great-grandmother']//ex:parent-of/* to select all the descendants of the resource http://example.org/great-great-grandmother. This is accomplished through two constraints: First, a XPath step with a descendant (or ancestor) axis (e.g. //ex:parent-of) will only test predicate elements for a match, skipping over subject and object elements. Second, once a predicate element that doesn't match is found the search stops (unlike XPath, which continues to search descendants for matches).

By emulating a transitive join, RxPath provides an intuitive behavior for a RDF query language and -- perhaps more importantly -- avoids situations requiring the RxPath processor to exhaustively follow every statement with a resource as an object, often an extremely expensive approach. Also, without placing the Predicate element constraint on the descendant and ancestor axes it would be impossible to statically analyze an RxPath expression to determine if each step refers to a subject, predicate, or object, thus making efficient implementation of certain kinds of RxPath processors (for example, an RxPath to SQL converter) much more difficult.

Handling circularity

Given the mapping rules described so far, if an RDF graph contained a cycle, it would map to tree containing infinitely deep branches. To avoid this, RxPath specifies special handling for cycles like this -- but only when it needs to -- and it only needs to when traversing a descendent axis. For all other XPath expressions, no special handling of cycles is necessary. To see why, consider this RxPath expression /*/rdf:type/*/rdf:type/* applied to an RDF graph that includes this statement: <rdfs:Class> <rdf:type> <rdfs:Class>. (in fact, with RDF Schema this statement can be inferred from any RDF graph). Intuitively, one would expect the expression to execute without error and the rdf:Class resource to be included in the result, even though the pattern matches a redundant element generated through a cycle.

But if you consider an expression such as /*//rdf:type/* you would likely expect the rdf:Class resource to appear only once in the result. To implement this behavior RxPath specifies that when evaluating an XPath step with one of the descendant axes the processor enters a circularity checking mode, which behaves as follows: when evaluating a child object element, check if any of the ancestor nodes (up to the node where the descendent axis began its search) have the same resource URI as the object. If so, object node will appear to have no children, and the search will not descend any further. Following this rule, if a resource is an object of itself through some predicate (either directly or indirectly), it will only appear once in the expression's result.

Node identity and document order

RxPath does not change XPath rules for node identity, node comparison, or document order. It slightly modifies the semantics of XPath's id() function: id values lookup the top-level subject element with the equivalent URI reference. Some consequences of these rules:

  • The result of a XPath expression can contain multiple nodes that refer to the same resource (e.g. a subject element and an object element).
  • But nodes that refer to the same resource will appear equal because XPath comparison and equality relies on the string value of the node, which, in the case of resource elements, is the resource's URI reference.
  • id() can be used to remove such duplicate nodes.
  • The document order ensures that RDF list and transitive queries will return nodes in the natural order.

RDF collections and containers

RDF provides some classes and properties for representing multi-valued properties. They are referred to as RDF containers (rdf:List) and RDF collections (rdf:Seq, rdf:Bag and rdf:Alt). To provide more natural and expressive queries, RxPath provides special case mappings for these classes by omitting the RDF statements used to represent item order (i.e. the anonymous list node resources used by RDF collections and the

rdf:_n
               
properties used by RDF containers) and instead expresses this using the document order. This is best illustrated by an example of how RxPath represents an RDF list:
    <rdf:List rdf:about="http://example.com#exampleList" >
        <rdf:first listID="http://example.com#exampleList" >
          <rdf:Description rdf:about="http://example.com#someResource" />          
        </rdf:first>
        <rdf:first listID="bnode:list1List">the second list item, which is a
        literal</rdf:first>
    </rdf:List> 
    

This example show a list resource with two items, a resource and literal. Given this XML pattern we can see how /*[.='http://example.com#exampleList']/rdf:first/* selects all the items of the list and (/*[.='http://example.com#exampleList']/rdf:first/*)[last()] selects the last item. Note the listID attributes on the rdf:first elements. This attribute records the node ids of the anonymous resource nodes omitted from the RxPath model, enabling us to meet the requirement of having our mapping preserve all the information in the source RDF model.

The mapping for RDF containers is similar, except instead of using rdf:first, the property rdfs:member is used to replace the rdf:_1, rdf:_2 (etc.) properties that denote the container's order, and the listID attribute is set to the source

rdf:_n
               
property.

RxPath extension functions

RxPath adds the following XPath extension functions to the core set of XPath functions:
rdfdocument()

This function is equivalent to XSLT's document() function except it assumes the input URLs are RDF documents, not XML, and returns RxPath Document nodes instead of XML Document nodes. This function can be used to access RxPath nodes while processing standard XML.

uri()

Returns a URI reference based on the QName of the given node, using the algorithm described above. If the argument is a string, treat its value as a QName and convert it using the current namespace mapping

name‑from‑uri(), local‑name‑from‑uri(), namespace‑uri‑from‑uri()

Given a URI reference, compute and return an appropriate QName (or part of the QName). (Calling these functions may cause a new namespace mapping to be added to the RxPath DOM.)

is‑resource(),
is‑predicate()

These functions test whether the given nodes are resource or predicate elements, respectively.

is‑instance‑of(), is‑subclass‑of(), is‑subproperty‑of()

These functions return true if any of the resources specified in the first argument is an instance of/subclass of/subproperty of any of the class or property resources specified in the second argument, respectively. (These functions are defined as such to match XPath nodeset equality semantics.)

get‑statement‑uris()

Given a nodeset of predicate elements, return a nodeset of subject elements corresponding to the rdf:Statement resources reifying the statements. 3

get‑context()

Given a URI associated with a named graph, return a Document node that contains only the statements in that named graph. This function is available when the underlying RDF store supports a common extension to the RDF model called named graphs[Named Graphs] (also called RDF contexts). A named graph is a subset of a RDF graph that is identified by a URI.

Comparison with other attempts at apply XPath and XSLT to RDF

There have been several other attempts to marry XPath and RDF. This section reviews some of the approaches taken and compares them to RxPath.

Syntactic

The simplest approach to using RDF with XPath is to just apply XPath expressions to an XML serialization of the RDF model. For a discussion and example advocating this approach see [Syntactic-Web]. One difficulty with this approach is that the standard RDF/XML serialization is complex and non-deterministic -- there are many possible serializations for a given RDF model. Therefore, this approach usually relies on a more deterministic subset of the RDF/XML serialization, or on alternative, simpler serializations such as TriX[TriX]. Another, more fundamental, difficulty is that the XPath expressions don't know about the semantics of the model -- for example, inferred statements or subsumption relationships such as subclass and subproperty. While an XML serialization could include a materialization of all the inferences possible with a given ontology, this is clearly too expensive for many, if not most, applications. Alternative approaches to this problem include using XPath extension functions that understand the RDF model's semantics or writing complicated queries that reflect the semantics of ontology.

XPath-like query languages

The majority of approaches are what might be called "XPath-like": query languages with syntax and semantics similar to XPath. An survey of these can be found in [RDF Path]. While these approaches have the benefits of XPath's expressiveness and familiarity, they do not meet our requirements as described above, as they are neither syntactically identical to XPath nor designed for use with languages that utilize XPath (such as XSLT).

XPath/XSLT extensions

Another approach to enable RDF for use with XPath and XSLT is to define XPath extension functions and XSLT extension elements. These approaches are similar to RxPath in that they can used by standard XPath and XSLT tools, and in fact RxPath provides an XPath extension function (rdfdocument()) to enable the same type of integration. However, solely using extension functions limits the depth of the integration between RDF and XPath. Because the relationship between RDF and the XPath data model is not specified, there are many applications of XPath where this approach can not be used. For example, functions calls are not allowed XSLT match patterns so extension functions can not be used in that context. Another example is XPath applications that modify the data model, such as XForms and XUpdate.

Some of the more well-known examples of this approach:

Twig

Twig[Twig] is a set of XPath extension functions that return various XML serializations of the RDF model, such as breadth-first and depth-first trees. Thus standard XPath can be applied to the resulting XML.

Nemo

Nemo [Nemo], is in essence an XSLT extension function for accessing Jena's query facilities. The query result is made available to the XSLT processor (Saxon) using Saxon's mechanism for accessing arbitrary Java objects.

Treehugger

Of the approaches discussed here, TreeHugger[Treehugger] is closest to RxPath. It provides an extension function that enables XPath expressions to be applied to the RDF document, using a mapping similar to RxPath's. However, it only supports a small subset of XPath, basically just the child and parent axes.

Semantic

The final approach to marrying RDF and XPath is semantic: defining a mapping between the semantics of RDF and XPath models. One way [Yin-Yang] to do this is to formally describe the XPath data model using the same model theory used to formally describe RDF semantics. This enables an XPath (or XQuery) processor to gain semantic understanding of both RDF and XML data. This is a deep approach, but it is one that requires a strong understanding of RDF semantics to use, and there are still open research questions that need to be resolved. One can say that RxPath does the inverse mapping: from the RDF abstract syntax to the XPath data model. However, this mapping is not quite semantic -- it doesn't attempt to retain implied semantic information about RDF statements in the model -- interpretation is implementation dependent (most RDF query languages also take this approach).

Comparison with SPARQL

This section compares RxPath to SPARQL, the W3C's recommended RDF query language [SPARQL]. For an independent comparison of RxPath to other RDF query languages see [Haase].

Advantages

SPARQL is tuple-based (as are most RDF query languages), with a syntax reminiscent of SQL. RxPath is path or traversal based. As an RDF model is easily conceptualized as a graph, a traversal-based syntax tends to be more compact and, to many, more intuitive. Consider this simple RxPath expression:

/ex:Document/ex:editor/*/ex:homepage/*

Compare this to the equivalent query in SPARQL:

SELECT ?homePage
WHERE { 
     ?doc a  ex:Document ;
             ex:editor ?editor .
     ?editor ex:hompage ?homepage.
}

RxPath supports a few types of queries that can not be expressed in SPARQL:

  • Collection and containers. While SPARQL has some syntactic sugar for writing a RDF list (but not RDF containers), it has no mechanism for queries such as selecting all items in a list or selecting the last list item when the length of the list is not known.
  • Transitive and inverse transitive queries.
  • SPARQL does not support aggregates, while RxPath can take advantage of the simple aggregates that XPath supports, like count() and max().

Disadvantages

The main drawback of RxPath compared to a tuple-based query language like SPARQL is the lack of structure in query results: XPath expressions can only return a single number, boolean or string value or an unordered, non-nesting, un-typed set of nodes. Thus results will have no order and duplicates and null values will never be present and so correlations between nodes can not be determined.

Building on RxPath

One of the most useful aspects of RxPath is its ability to re-use many of the special purpose languages that rely on XPath; RxPath, being syntactically identical to XPath, can be substituted, creating RDF equivalents. However, for these to be useful or even implementable, often the semantics need modification or refinement. This section discusses a few some of these languages.

XSLT

Combining XSLT with RxPath gives us a language for transforming RDF to XML, called RxSLT. It is syntactically identical to XSLT 1.0 and behaves similarly, with the following two exceptions. First, the XLST patterns used for template matching are limited to absolute path patterns. This is done for the same reasons the descendant axis is constrained, as outlined above. Second, RxSLT defines special behavior for xsl:copy-of (but not xsl:copy) -- when the processor encounters an object element, instead of copying it and its children, the object element is replaced with an attribute named rdf:resource whose value is the URI reference of the object.

Schematron

Schematron[Schematron] is a validation language that uses XPath expressions as assertions about the validitity of a XML document. Using RxPath, Schematron can be used to validate a RDF model. The benefits of using Schematron to validate XML also apply to validating RDF: Schematron allows complex, ad-hoc assertions to be expressed that can't easily be expressed with other schema languages. For example, because languages like RDF schema and OWL[OWL] are based on an open world model, it can't define constraints that apply against the entire model (such as uniqueness or default values). And compared to languages like OWL, Schematron is easier to write and understand and requires much less specialized knowledge.

Unlike the standard Schematron, The RxPath implementation of Schematron allows any RxPath expression to be used, not just the restricted XSLT pattern syntax. This allows the use of variable references, which enables a Schematron stylesheet to incrementally validate changes to an RDF model if the system provides XPath variables that reference the added or removed nodes.

XUpdate

XUpdate[XUpdate] is an XML vocabulary for describing modifications to an XML document. It defines operations such as "append", "remove", and "rename", etc. and uses XPath to select parts of the document. With RxPath, we can use this vocabulary to update an RDF graph. There is particularly useful because there is no standard language for updating RDF graphs.

The RxPath implementation of XUpdate makes modifying the document in any way other than adding or removing Subject and Predicate elements an error. This constraint has the effect that XUpdate operations can add or remove statements in the RDF model, but never modify an existing statement. When XML is added to the document (e.g., when using the xupdate:append element), the XML must conform to the subset of RDF/XML that matches the RxPath serialization and is treated as the equivalent RDF statements.

The original XUpdate specification was never completed and the RxPath reference implementation adds several general purpose features, including variables, conditional processing, iteration, attribute value templates, messages, and callable templates. Where feasible, the design and presentation of each of these features follows the equivalent functionality in XSLT.

DOM API

An RxPath implementation can provide a programmatic API to the data model using the standard DOM interface (DOM Level 3 XPath[DOM XPath]). This enables RDF models to be programmatically manipulated by a well-known, standard interface as opposed to learning one specific to RDF.

Implementation

RxPath and the RxPath-derived languages discussed above have been implemented in Python and released as open source4.

The reference RxPath processor is designed to enable pluggable query engines that can work with 3rd party RDF stores and query languages. The current implementation uses a query engine that generates a physical data access plan (using simple optimization strategies) and can work with any RDF store that supports a minimal set of simple lookup primitives. Future plans include developing query engines that translate RxPath expressions to SPARQL or directly to SQL. The RxPath processor uses the following steps to execute an RxPath expression:

  1. Substitute any XPath variable references with the appropriate value.
  2. Translates the XPath expression into a simpler abstract syntax tree (AST) that represents the query in terms of relational algebra operations. All RxPath expressions can be represented by a small set of relational algebra operators applied to a set of triples that represent the statements in the model:
    Select

    XPath predicates and node tests correspond to the select operation. This operation needs to support sub-queries in the select predicate; if the corresponding XPath sub-expression is a relative path, the query will be correlated with the outer query, otherwise (i.e. with an absolute XPath path) it will not be correlated.

    Join

    The only join operator needed is a reflexive equijoin between the object and subject of a statement. The descendent axis corresponds to a recursive (transitive closure) version of this join.

    Project

    XPath's parent and ancestor axes correspond to a project operation on the joined result set, as does the position of the final step of an XPath path.

    Union

    Used for the XPath union operator.

    Group-by/Distinct

    Used to determine nodeset uniqueness and to reconstruct the XPath context (see below).

  3. Next, the AST may be transformed. Transformations happen for a few reasons: One, to provide schema support if not provided by the underlying query engine, for example, by updating a property match to also match its sub-properties. Two, to optimize the query, for example by reversing join operands. Three, to support the underlying query engine; for example, SPARQL doesn't support a recursive join, so the join would be replaced with a union of some configurable number of optional joins.
  4. Execute the query using the specified query engine. The engine may translate the AST directly into a physical data access plan (as in the current implementation) or translate it to another query language such as SQL or SPARQL. For example, the AST can be translated into SPARQL in straight-forward manner with the exception that SPARQL doesn't directly support nested queries as filter conditions; these can be translated using optional graph patterns for correlated (relative) sub-expressions and the union operator for non-correlated (absolute) sub-expressions.
  5. Filter rows in the result set by evaluating any predicates that couldn't be handled by the given query engine. For XPath functions that aren't translatable to query primitives (this includes numeric predicates, which are shorthand for the position() function), the filter condition will be dynamically applied to the result set, with the XPath function called with a simulated XPath context. This allows the implementation to be compatible with standard XPath extension functions. The XPath context is recreated as follows:
    • We can obtain RxPath's notion of document order by lexicographically sorting the result set.
    • We determine the position and size attributes of the XPath context by performing the equivalent of a "group by" operation on all the rows to the left of the current evaluation point.
    • The context's current node can be synthesized by building a DOM node based on values in the current row of the result set. Attributes such as siblings and children that can't be determined from the result set are lazily computed on request by executing additional queries.

    If the XPath function takes additional parameters that correspond to correlated queries (e.g. for an XPath expression like /*[myExtFuncs:custom-test(./ex:someProp)] ), we can create node sets from the appropriate columns of the result set, one for each row.
  6. Convert the final result set to a node set as described in the previous step.

Conclusions and Future Work

The usefulness of RxPath's approach is illustrated by Rhizome[Rhizome]. Rhizome is a web application framework that makes it possible to rapidly build, in a Wiki-like fashion, web applications that work directly with RDF representations. Rhizome extensively uses RxPath's mappings of XSLT, XUpdate and Schematron to implement application logic; without RxPath, much of this functionality could not be implemented in declarative way and would need instead to rely on the non-standard, programmatic APIs of a particular RDF library.

Future areas of investigation for RxPath include:

  • Improving the efficiency of the implementation, in particular, adding a data plan engine that can translate RxPath expressions to SPARQL, as discussed above.
  • Supporting ontology languages such as OWL (currently only RDF Schema is supported). This will probably require modification of RxPath semantics; for example, OWL[OWL] allows the inferencing of equivalences between individual resource URIs (e.g. by using owl:sameAs), which may impact the semantics of node equality.
  • Experimenting with using additional XPath-based languages with RxPath, in particular XForms. Defining a XForms mapping for RxPath would provide a declarative way to create a user interface for presenting and updating RDF data. A direct mapping is simple, but a more involved one is needed to enable the creation of XForms documents that work with XForms clients that do not support RxPath, such as web browsers.
  • Updating the RxPath mapping to the XPath 2.0/XQuery 1.0 data model. This would result in a more expressive language (with full FLOWR (for, let, where, order by, and return) expressions) that would remove some of the disadvantages of RxPath discussed above, such as the lack of structure and order in XPath nodesets. Supporting XQuery would also enable more powerful aggregates, and better support for XML Schema datatypes.

Notes

1.

This attribute is not strictly necessary since it does not add information to the tree, but is included because it makes writing certain queries easier.

2.

Note that the rules described here for descendant and ancestor axes also apply to the other XPath axes that encompass either descendant or ancestor searching: descendant-or-self, following, proceeding, and ancestor-or-self.

3.

RxPath uses this function instead of an attribute on the predicate element (as the RDF/XML syntax does with the rdf:ID attribute) because it is possible and valid for a statement to be reified by more than one rdf:Statement resource.

4.

RxPath is available at http://sourceforge.net/projects/rx4rdf.


Bibliography

[DOM XPath] Document Object Model (DOM) Level 3 XPath Specification Version W3C Note, 2004 http://www.w3.org/TR/DOM-Level-3-XPath,

[Haase] Peter Haase and Jeen Broekstra and Andreas Eberhart and Raphael Volz A comparison of RDF query languages. Proceedings of the Third International Semantic Web Conference. Hiroshima, Japan, 2004. http://www.aifb.uni-karlsruhe.de/WBS/pha/rdf-query/.

[Named Graphs] J.J. Carroll, C. Bizer, P. Hayes, P. Stickler, Named Graphs, Provenance and Trust, HP Labs Tech Report HPL-2004-57, http://www.hpl.hp.com/techreports/2004/HPL-2004-57, 2004

[Nemo] James Francis Cerra, Nemo Project home page https://nemo.dev.java.net/

[OWL] M. Dean and G. Schreiber, OWL Web Ontology Language Reference, http://www.w3.org/TR/owl-ref/, W3C, 2004

[RDF Path] Sean B. Palmer, Pondering RDF Path http://infomesh.net/2003/rdfpath (2003)

[RDF Schema] RDF Vocabulary Description Language 1.0: RDF Schema, http://www.w3.org/TR/rdf-schema/, W3C, 2004

[RDF Semantics] P. Hayes, RDF Semantics, http://www.w3.org/TR/rdf-mt/, W3C, 2004

[RDF/XML] D. Beckett, RDF/XML Syntax Specification (Revised), http://www.w3.org/TR/rdf-syntax-grammar/, W3C, 2004

[Rhizome] Adam Souzis, Building a Semantic Wiki. IEEE Intelligent Systems, vol. 20, no. 5 September/October, 2005 http://www.liminalzone.org/static/IEEE_IS_Souzis_v20n5.pdf

[RxPath] Adam Souzis, RxPath Specification http://www.liminalzone.org/RxPathSpec

[Schematron] ISO/IEC 19757-3 Document Schema Definition Languages: Part 3 - Rule-based validation -Schematron (2004)

[SPARQL] SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/, W3C, 2006

[Syntactic-Web] Jonathan Robie The Syntactic Web, XML 2001, 2001 http://www.idealliance.org/papers/xml2001/papers/html/03-01-04.html.

[Treehugger] Damian Steer, Treehugger: The RDF Model Meets XPath, XML Europe 2004. http://rdfweb.org/people/damian/treehugger/

[TriX] J.J. Carroll, P. Stickler, RDF Triples in XML, Extreme 2004.

[Twig] Norm Walsh, RDF Twig: accessing RDF graphs in XSLT, Extreme 2003 http://rdftwig.sourceforge.net/

[XPath 1.0] XML Path Language (XPath) Version 1.0. W3C Recommendation, http://www.w3.org/TR/xpath (11/1999).

[XSLT 1.0] XSL Transformations (XSLT) Version 1.0. W3C Recommendation, http://www.w3.org/TR/xslt (11/1999).

[XUpdate] XUpdate - XML Update Language Version 1.0 Andreas Laux, Lars Martin. 2000 http://xmldb-org.sourceforge.net/xupdate/.

[Yin-Yang] Peter Patel-Schneider, Jerome Simeon The Yin/Yang Web: XML Syntax and RDF Semantics 2002 http://www-db.research.bell-labs.com/user/pfps/papers/yin-yang.pdf.



RxPath: a mapping of RDF to the XPath Data Model

Adam Souzis [Liminal Systems]
asouzis@users.sf.net