Reusing data across Topic Maps and RDF

Steve Pepper
Valentina Presutti
Lars Marius Garshol
Fabio Vitali

Abstract

This paper describes some of the trickier issues involved in creating a semantic mapping from Topic Maps to RDF and vice versa, and the solutions currently under consideration by the RDF/Topic Maps Interoperability Task Force of the W3C's Semantic Web Best Practices Working Group.

Keywords: Topic Maps; RDF; Mapping; Semantic Web

Steve Pepper

Steve Pepper is the founder of Ontopia, a company that provides Topic Maps software, consulting, and training services. Steve represents Norway on JTC1/SC34, the ISO committee responsible for the development of SGML and related standards, and is convenor of WG3 (Information Association), whose responsibilities include the HyTime and Topic Map standards. He is the editor of the XML Topic Map specification (XTM) and the author of numerous papers and presentations on topic map-related subjects, including the well-known TAO of Topic Maps. A frequent speaker at SGML, XML, and knowledge management events around the world, Steve was for many years the author and maintainer of the Whirlwind Guide to SGML and XML tools. He also co-authored (with Charles Goldfarb and Chet Ensign) the SGML Buyer's Guide (Prentice-Hall, 1998).

Valentina Presutti

Valentina Presutti graduated in Computer Science in 2002 and received a Ph.D in Computer Science in 2006 at the University of Bologna. Currently, she is a research fellow at the Laboratory of Applied Ontology in Rome within the EU funded project NeOn. Her research interests include Semantic Web languages, Web searching, ontology engineering and ontology-based software engineering.

Lars Marius Garshol

Lars Marius Garshol is Chief Technology Officer at Ontopia. He has been active in the XML and topic map communities as a speaker, consultant, open source developer, and technology creator for a number of years. He helped develop the standard SAX API for XML development, translated it to Python, and wrote an open-source validating XML parser in Python. Lars Marius has also been responsible for adding Unicode support to the Opera web browser. His book on Definitive XML Application Development, was published by Prentice-Hall in its Charles Goldfarb series. Lars Marius is one of the editors of the ISO Topic Map Query Language standard, and also co-editor of the Topic Map Data Model.

Fabio Vitali

Fabio Vitali is a professor at the Department of Computer Science at the University of Bologna. He holds a Laurea degree in Mathematics and a Ph.D. in Computer and Law, both from the University of Bologna. His research interests include markup languages; distributed, coordinated systems; and the World Wide Web. He is the author of several papers on hypertex functionalities, the World Wide Web, and XML.

Reusing data across Topic Maps and RDF

Steve Pepper [Ontopia]
Valentina Presutti [University of Bologna]
Lars Marius Garshol [Ontopia]
Fabio Vitali [University of Bologna]

Extreme Markup Languages 2006® (Montréal, Québec)

Copyright © 2006 Steve Pepper, Valentina Presutti, Lars Marius Garshol, and Fabio Vitali. Reproduced with permission.

1. Introduction

1.1 Background

The Resource Description Framework (RDF) is a model developed by the W3C for representing information about resources in the World Wide Web. Topic Maps is a standard for knowledge integration developed by the ISO. The two specifications were developed in parallel during the late 1990's within their separate organizations for what at first appeared to be very different purposes. The results, however, turned out to have a lot in common and this led to calls for their unification.

While unification has not been possible (for a variety of technical and political reasons), a number of attempts have been made to uncover the synergies between RDF and Topic Maps and to find ways of achieving interoperability at the data level. There is now widespread recognition within the respective user communities that achieving such interoperability is a matter of some urgency. Since late 2004 a task force has been operating within the Semantic Web Best Practice Working Group of the W3C with the support of the ISO Topic Maps community in order to address this problem. Its first deliverable, A Survey of RDF/Topic Maps Interoperability Proposals [Survey], was published as a Working Group Note in March 2005; the second, Guidelines for RDF/Topic Maps Interoperability [Guidelines], is currently available as an Editors' Draft dated June 2006.

1.2 The Case for Semantic Mappings

[Survey] provides an in-depth analysis of five earlier proposals for mappings between RDF and Topic Maps (and vice versa), contributed by Graham Moore [Moore], Martin Lacher and Stefan Decker [Lacher et al], Nikita Ogievetsky [Ogievetsky], Lars Marius Garshol [Garshol], [Garshol], and Paolo Ciancarini, Riccardo Gentilucci, Marco Pirruccio, Valentina Presutti, and Fabio Vitali [Gentilucci et al], [Ciancarini et al]. It shows how these proposals fall into two distinct groups, termed object mappings and semantic mappings respectively, and sums up the basic differences between the two as follows:

  • Object mappings use the low-level building blocks of one language to describe the object model of the other. For example, assuming that the structure of a simple binary association is a quintuple, consisting of one (a)ssociation, two (r)oles, and two role (p)layers (p-r-a-r-p), that association would be represented (using an object mapping) as four RDF statements that relate five resources.
  • Semantic mappings start from higher level concepts that carry the semantics of each model and attempt to find equivalences between them. A binary association in Topic Maps would be seen to represent the same kind of “thing” that is often represented by an RDF statement (i.e., a relationship between two entities) and would therefore be represented using a single RDF statement. Where no direct semantic equivalent can be found, the missing semantics are defined using the facilities available in one of the two paradigms, i.e., classes, properties, or published subjects.

The key advantage of semantic mappings is that they “yield much more natural results”. However they “suffer from the disadvantage that genericity is much harder to ensure and may in some cases require additional information not always present in the source document.” Despite this disadvantage, [Survey] comes out in favour of semantic mappings (“provided that a sufficient degree of completeness can be achieved”) because of the importance of naturalness. The latter is defined as

the degree to which the results of a translation correspond to the way in which someone familiar with the target paradigm would naturally express the information content in that paradigm. Naturalness normally also confers improved readability on the result.

Unnatural translation results, of the kind produced by object mappings, have a number of undesired consequences, all of which lead to reduced interoperability:

  • the result will not merge cleanly with data originating in the target model,
  • the result will not conform to vocabularies created in the target model, and
  • queries written against the target model will not work with translated data.

In conclusion, [Survey] lists what the authors consider to be the issues in creating a semantic mapping. This paper presents and discusses the proposed solutions to the most intractable of these, namely identity, reification, non-binary relationships, scope and variant names. However, before examining these issues, we present a short overview of what are regarded as generally unproblematic aspects.

1.3 Non-issues

The semantic mapping proposed by [Guidelines] is based on two fundamental equivalences (or near equivalences): that between RDF “resources” and TM “subjects”; and that between RDF statements and what are also collectively known as statements in Topic Maps, i.e., names, occurrences, and associations.

Because of these equivalences, any name, occurrence, or (binary) association can and should be mapped to its most natural representation in RDF, i.e., a single RDF statement. In the case of names and occurrences this is entirely unproblematic when going from Topic Maps to RDF, as the following examples show1:

[puccini = foaf:name : "Giacomo Puccini"]
{puccini, bio:dateOfBirth, [[1858-12-22]]}

translates to

:puccini
  foaf:name        "Giacomo Puccini" ;
  bio:dateOfBirth  "1858-12-22" .

which is entirely natural. However, the result does not contain enough information to allow for round-tripping since a translator will not know which (if any) of these statements should be mapped to a name and which (if any) to map to an occurrence. (These are the only two possibilities in this example, since the values of both statements are literals.)

In order to solve this problem, [Guidelines] mandates the use of “guidance statements” to declare that the property foaf:name is of type rdftm:NameProperty, and that bio:dateOfBirth is of type rdftm:OccurrenceProperty:

foaf:name        rdf:type  rdftm:NameProperty .
bio:dateOfBirth  rdf:type  rdftm:OccurrenceProperty .

These statements are generated when going from Topic Maps to RDF. For vocabularies originating in the RDF paradigm for which the designers wish for interoperability with Topic Maps, they are expected to be supplied as part of the ontology.

Binary associations present a slightly different challenge, which also requires the presence of guidance information in order to be solved satisfactorily: in going from Topic Maps to RDF, it is necessary to specify which of the two role-playing topics should be the subject of the resulting RDF statement; in going from RDF to Topic Maps it is necessary to specify the type of the role to be played by the subject and object of the statement respectively. For example, given the following association:

bio:born-in( puccini : foaf:Person,
             lucca   : geo:place )

should Puccini or Lucca become the subject of the resulting RDF statement? The “obvious” answer for us humans is Puccini, but that answer is only obvious because we can intuit the likely semantics of a relation with the name born-in; a machine cannot do that.

The solution proposed by [Guidelines] is to annotate the bio:born-in property using the properties rdftm:subject-role and rdftm:object-role, thus:

bio:born-in
 rdftm:subject-role  foaf:Person ;
 rdftm:object-role   geo:place .

This allows a translator to select Puccini (i.e., the topic playing the role of Person) as the subject of the resulting RDF statement, and to select Lucca (playing the role of Place) as the object, to produce the following result:

:puccini  bio:born-in  :lucca .

The same two guidance statements can be used when going the other way to tell the translator which role types to use for the subject and object of the RDF statement (i.e., foaf:Person and geo:Place respectively).

Two classes and two properties are thus sufficient to provide the annotations necessary to enable natural translations of names, occurrences and binary associations in TM2RDF, and (almost) any RDF statement in RDF2TM. The real challenges arise in other areas, as the following sections describe.

2. Identity

Both Topic Maps and RDF use URIrefs as identifiers (for subjects and resources, respectively). However, in Topic Maps there are two ways in which a URIref can be used to identify a subject:

  • directly, as the locator of an information resource that is the subject, in which case the URIref is called a subject locator; or
  • indirectly, as the locator of an information resource that provides some human-interpretable indication of the subject, in which case it is called a subject identifier.

In Topic Maps, it is always clear whether the URIref is a subject locator or a subject identifier.

In RDF URIrefs are also widely used in both modes of identification, i.e. direct and indirect. However, as [Pepper et al] points out, the RDF model does not provide a way to distinguish between the two modes of usage. This, in turn, is due to the fact that RDF does not recognize the fundamental ontological distinction between resources in general (which correspond to subjects in Topic Maps) and information resources (corresponding to what has been termed “addressable subjects” in Topic Maps); the latter are resources that are network retrievable (i.e., “web documents”) and thus directly addressable. They constitute a subset of resources in general.

Since RDF does not make the distinction between direct and indirect forms of identification, the question arises, when going from RDF to Topic Maps (RDF2TM), whether to map the URIref of a resource to a subject locator or to a subject identifier; and, conversely, in going from Topic Maps to RDF (TM2RDF), whether to map subject locators or subject identifiers (or neither, or both) to the URIrefs of resources. Any solution which privileges one type of identifier will lead to unnatural results with identifiers of the other type.

The approach adopted by [Guidelines] is to retain some of the ambiguity of the RDF approach, while at the same time providing for round-tripping, by formalizing the concept of “Information Resource”. The URIrefs of resources that are instances of this class are interpreted as subject locators in RDF2TM; the URIrefs of all other resources are interpreted as subject identifiers. Conversely, topics with subject locators are mapped to resources that are asserted to be instances of Information Resource, thus ensuring that there is sufficient information for round-tripping.

The rules are stated as follows2:

    TM2RDF
  • When a topic has one or more subject locators, one subject locator becomes the URIref of the RDF node and the resource is made an instance of rdftm:InformationResource. Additional subject locators become owl:sameAs properties. Any subject identifiers become rdftm:subjectIdentifier properties.
  • When a topic has one or more subject identifiers and no subject locators, one subject identifier becomes the URIref of the RDF node. Any additional subject identifiers become rdftm:subjectIdentifier properties.
    RDF2TM
  • A URIref of a resource that is an instance of rdftm:InformationResource becomes a subject locator. Any owl:sameAs properties become additional subject locators. Any rdftm:subjectIdentifier properties become subject identifiers.
  • A URIref of a resource that is not an instance of rdftm:InformationResource, becomes a subject identifier. Any additional rdftm:subjectIdentifier properties become subject identifiers.

When a topic has multiple subject locators3, owl:sameAs can be used to state their equivalence. However, because of the semantics of owl:sameAs, it cannot be used to state equivalence between two subject identifiers since the subject indicators to which these resolve are in general not the same resource. Instead, the rdftm:subjectIdentifier property is used to handle situations with multiple subject identifiers (and also the unusual case of a topic having both a subject locator and a subject identifier).

3. Reification

RDF provides a facility called reification which allows statements to be made about statements. Topic Maps also provides a facility called reification which allows names, associations, occurrences, roles, and topic maps to be regarded as topics in order for assertions to be made about them. While these two facilities might seem to be equivalent, on closer inspection they turn out to be quite different:

  • In RDF, reification applies to the actual statement, that is, the subject-predicate-object triple itself, not to the relation that the triple represents.
  • In Topic Maps, on the other hand, reification allows assertions to be made about the subject represented by the reified statement.

Thus, the association

bio:born-in( puccini : person, lucca : place )

could be reified in order to state, for example, that the relationship thus described commenced on 22 December 1858. However, the kinds of statements that could legitimately be made about the reification of the corresponding RDF triple,

:puccini  bio:born-in  :lucca  .

are that it (i.e., the triple) occurs in a certain RDF graph, that it was asserted by a particular person on a given date, etc. In other words, although the relation (or relationship) represented by the association and the RDF statement are the same (i.e., the birth of Puccini in Lucca), the subjects that result from reifying them are different. This becomes clear if one considers the RDF reification vocabulary, which uses the class rdf:Statement, thus indicating that the result of reifying a statement is still a statement, not the relation represented by the statement.

The solution adopted by [Guidelines] to the problem of reification, is to reuse the RDF reification vocabulary, but to define a new class, rdftm:Relation, which is used instead of rdf:Statement, as follows:

bio:born-in( puccini : person, lucca : place )
  ~puccini-birthplace

translates to

:puccini-birthplace
  rdf:type       rdf:Relation ;
  rdf:subject   :puccini ;
  rdf:predicate  bio:born-in ;
  rdf:object    :lucca .

4. Non-binary relationships

Binary associations are relatively easy to map to RDF, as we have shown, but Topic Maps allows associations of any arity. Work has recently been undertaken by the SWBPD's Ontology Engineering and Patterns Task Force [Noy et al] to define patterns for describing n-ary relations in RDF. In order to ensure maximum interoperability, [Guidelines] builds on that work and supports the same patterns.

[Noy et al] identifies two basic representation patterns for n-ary relationships: Pattern 1, which they term Introducing a new class for a relation, and Pattern 2, Using lists for arguments in a relation. Of these, only Pattern 1 corresponds to the Topic Maps notion of n-ary relationships. Pattern 2, representing “a list or sequence of arguments”, is most naturally represented in Topic Maps using multiple, chained associations.

[Noy et al] provides three use cases for Pattern 1. All of them involve defining a class to represent the property, and a resource that is an instance of that class to represent the relationship. Each “argument of the relation” (i.e., role player in the relationship) is then linked to the resource that represents the relationship using a single RDF statement. However, there are two subtly different structures that can be employed:

  • In the first two use cases, one of the participants in the relation is the subject of the statement connecting it to the relation, while the others are objects of their respective statements.
  • In the third use case, every participant is the object of the statement connecting it to the relation.
[Link to open this graphic in a separate page]

In the first two use cases, one of the role players is regarded as “standing out” in some way, of being a “distinguished participant” that plays a more central role as the subject of the relationship (pattern 1A). In the third use case, all participants are regarded as being equally important; none of them can be regarded as being the “subject” of the relationship (pattern 1B).

In order to know which of these two patterns to use when translating a non-binary association to RDF, the guidance properties rdftm:subject-role and rdftm:object-role, described above in connection with binary associations, are used. The rules for translating non-binary associations are expressed in prose as follows:

    TM2RDF
  • A non-binary association results in a blank node whose type is set to the association type which itself becomes an instance of rdftm:N-aryRelation.
  • A role whose type is the value of an rdftm:subject-role property results in a statement whose subject is the role-player; the role type becomes the predicate and an instance of rdftm:RoleProperty; the blank node becomes the object.
  • Each role whose type is the value of an rdftm:object-role property results in a statement whose subject is the blank node; the role type becomes the predicate and an instance of rdftm:RoleProperty; the role-player becomes the object.
    RDF2TM
  • A blank node whose type has the type rdftm:N-aryRelation becomes an n-ary association whose type is the same as the type of the blank node.
  • Each statement in which the blank node is either the subject or the object and whose type is an instance of rdftm:RoleProperty results in a role, and the statement's property becomes the role type.

The following example shows how a ternary association with no “distinguished participant” is translated to RDF:

ex:killed-by( scarpia  : ex:victim,
              floria   : ex:perpetrator,
              stabbing : ex:method )

becomes

[ rdf:type         bio:killed-by ;
  bio:victim      :scarpia ;
  bio:perpetrator :floria ;
  bio:method      :stabbing ] .

bio:killed-by  rdf:type  rdftm:N-aryRelation .

The task force is still discussing whether pattern 1A really is an appropriate representation of an n-ary association since there is nothing in Topic Maps that corresponds to the notion of a distinguished participant.

5. Scope

Topic Maps defines the concept of scope as “the context within which a statement is valid.” A scope is composed of a set of topics that together define that context. RDF has no equivalent concept, nor does it define any vocabulary for the representation of context.

Scope is essentially a way to annotate a name, occurrence, or association for the specific purpose of expressing its contextual validity; in other words, it is a special case of making an assertion about the relationship represented by a statement. Since the more general case involves reification, [Guidelines] simply defines a property rdftm:scope that is to be used with reified statements in order to allow scope to be expressed in RDF.

The rules are expressed as follows:

    TM2RDF
  • A scoped name, occurrence, or binary association is reified [according to the rules defined elsewhere] and the resulting blank node is assigned one rdftm:scope property for each topic in the statement's [scope] property.
  • A scoped non-binary association is translated according to [the pattern described elsewhere] and the node representing the reified relation is assigned one rdftm:scope property for each topic in the statement's [scope] property.
    RDF2TM
  • A blank node of type rdftm:Relation (or its subtype rdftm:N-aryRelation) with one or more rdftm:scope properties becomes a scoped statement.

The following example shows how a scoped name is translated to RDF:

[puccini = foaf:name : "Giacomo Puccini" / ex:foo ex:bar]

becomes

:puccini  foaf:name  "Giacomo Puccini" .
foaf:name rdf:type   rdftm:NameProperty .

[ rdf:type       rdftm:Relation ;
  rdf:subject   :puccini ;
  rdf:predicate  foaf:name ;
  rdf:object    "Giacomo Puccini" ;
  rdftm:scope    ex:foo , ex:bar
] .

6. Variant names

In Topic Maps, a name can have variants. A variant is an alternate form of the name that is intended to be used in a specific processing context, which itself is specified as a scope. No equivalent construct exists in RDF. A number of approaches were considered by the RDFTM task force, which came to the following conclusion:

Any solution that inserts an extra node between a resource and the literal that is its name is very unnatural and should be avoided. This includes both the use of a complex object to represent the name as a whole and the use of collections or containers. In addition, some usages of RDF collections (and containers) lead to losing the connection between the base name and its variant(s) and thus impact roundtripping.

The only viable alternative is to use a single property for the base name and to reify the statement in order to attach variants. This requires the addition of the rdftm:variant, rdftm:value, and rdftm:variantScope properties to the translation vocabulary. The rules are expressed as follows:

    TM2RDF
  • A topic name that has one or more variants results in a statement as described above for names in general.
  • That statement is reified using the RDFTM reification vocabulary [described elsewhere].
  • The resulting blank node is assigned one rdftm:variant property for each variant and its value becomes a blank node of type rdftm:Variant, which is assigned an rdftm:value property for the variant name's [value] property, and an rdftm:scope property for each topic in the variant name's [scope] property.
    RDF2TM
  • A blank node that has RDFTM reification properties and a rdftm:variant property results in a topic name with a variant.

The following example shows how to translate a name with a sort variant:

[puccini = foaf:name : "Giacomo Puccini" ;
    "puccini, giacomo"]

translates to

:puccini  foaf:name  "Giacomo Puccini" .
foaf:name rdf:type   rdftm:NameProperty .

[ rdf:type        rdftm:Relation ;
  rdf:subject    :puccini ;
  rdf:predicate   foaf:name ;
  rdf:object     "Giacomo Puccini" ;
  rdftm:variant
  [ rdf:type      rdftm:Variant ;
    rdftm:value  "puccini, giacomo" ;
    rdftm:scope   tm:sort ]
] .

The first two lines of the Turtle code shown above represent the translation of the topic name as it would appear if there were no variant, as a single (foaf:name) statement and a guidance statement indicating that this property is a name property. The next four lines reify the basic foaf:name statement using the RDFTM reification vocabulary (note the use of rdftm:Relation instead of rdf:Statement). The final four lines relate the resulting blank node to another blank node that represents the variant.

7. Conclusion

This paper has described the approach taken by the RDF/TM Interoperability Task Force to solving some of the trickier aspects of translating data from RDF to Topic Maps and vice versa. It shows that acceptable solutions can be found even to these problems within the framework of a semantic mapping that both preserves the information necessary for round-tripping and produces natural results in terms of the target paradigm.

Notes

1.

All examples in this paper are given using the compact syntaxes [LTM] and [Turtle] for Topic Maps and RDF, respectively.

2.

The RDFTM Task Force is currently working on a formal expression of these rules.

3.

This was not permitted in earlier versions of the Topic Maps standard, but the 2006 revision now allows it.


Bibliography

[Ciancarini et al] Ciancarini, Paolo; Gentilucci, Riccardo; Pirruccio, Marco; Presutti, Valentina; Vitali, Fabio: Metadata on the Web: On the integration of RDF and Topic Maps, http://www.idealliance.org/papers/extreme03/html/2003/Presutti01/EML2003Presutti01.html (2003)

[Garshol] Garshol, Lars Marius: Topic maps, RDF, DAML, OIL: A comparison, http://www.ontopia.net/topicmaps/materials/tmrdfoildaml.html (2001)

[Garshol] Garshol, Lars Marius: Living with Topic Maps and RDF, http://www.ontopia.net/topicmaps/materials/tmrdf.html (2003)

[Gentilucci et al] Gentilucci, Riccardo; Pirruccio, Marco: Metainformazioni sul World Wide Web: Conversione di formato e navigazione, University of Bologna, Masters Thesis, (2002; in print; in Italian)

[Guidelines] Pepper, Vitali, Garshol, Presutti: Guidelines for RDF/Topic Maps Interoperability, http://www.ontopia.net/work/guidelines.html (W3C Editors' Working Draft, June 2006)

[Lacher et al] Lacher, Martin S.; Decker, Stefan: On the Integration of Topic Maps and RDF Data, http://www.idealliance.org/papers/extreme03/html/2001/Lacher01/EML2001Lacher01-toc.html (2001)

[LTM] Garshol, Lars Marius: The Linear Topic Map Notation: Definition and introduction, version 1.2, http://www.ontopia.net/download/ltm.html (2002)

[Moore] Moore, Graham: RDF and Topic Maps: An exercise in convergence, http://xml.coverpages.org/moore-topicmapsrdf200105.pdf (2001)

[Noy et al] Noy, Natasha; Rector, Alan: Defining N-ary Relations on the Semantic Web: Use With Individuals, http://www.w3.org/TR/swbp-n-aryRelations/ (2004)

[Ogievetsky] Ogievetsky, Nikita: XML Topic Maps through RDF glasses, http://www.cogx.com/rdfglasses.html (2001)

[Pepper et al] Pepper, Steve; Schwab, Sylvia: Curing the Web's Identity Crisis: Subject Indicators for RDF, http://www.ontopia.net/topicmaps/materials/identitycrisis.html (2003)

[Survey] Pepper, Vitali, Garshol, Gessa, Presutti: A Survey of RDF/Topic Maps Interoperability Proposals, http://www.w3.org/TR/rdftm-survey/ (W3C Working Draft 2005)

[Turtle] Beckett, Dave: Turtle - Terse RDF Triple Language, http://www.dajobe.org/2004/01/turtle/ (2006)



Reusing data across Topic Maps and RDF

Steve Pepper [Ontopia]
Valentina Presutti [University of Bologna]
Lars Marius Garshol [Ontopia]
Fabio Vitali [University of Bologna]