Q: A model for topic maps: Unifying RDF and topic maps

Lars Marius Garshol
larsga@ontopia.net

Abstract

This paper describes a formal model for topic maps called Q, and structurally similar representations of topic maps and RDF in this formal model.

Keywords: Topic Maps; RDF; Modeling

Lars Marius Garshol

Lars Marius Garshol is currently Development Manager at Ontopia, a leading topic map software vendor. He has been active in the XML and topic map communities as a speaker, consultant, open source developer, and technology creator for a number of years. He helped develop the standard SAX API for XML development, translated it to Python, and wrote an open-source validating XML parser in Python.

Lars Marius has also been responsible for adding Unicode support to the Opera web browser. His book on Definitive XML Application Development, was published by Prentice-Hall in its Charles Goldfarb series. Lars Marius is one of the editors of the ISO Topic Map Query Language standard, and also co-editor of the Topic Map Data Model.

Q: A model for topic maps

Unifying RDF and topic maps

Lars Marius Garshol [Development Manager; Ontopia]

Extreme Markup Languages 2005® (Montréal, Québec)

Copyright © 2005 Lars Marius Garshol. Reproduced with permission.

Introduction

This paper proposes a simple formal model for topic maps called the Q model, which can informally be thought of as RDF triples extended with a fourth element representing the identity of the triple, and a fifth element representing the context. This enables the model to efficiently represent statements about other statements, which is the key to being able to represent topic maps compactly. It also enables the model to make each statement unique in the model. Q can thus be used to efficiently implement combined topic map/RDF engines.

The representation of topic maps in Q uses a vocabulary of properties and other values that together enable topic maps as defined in TMDM to be represented in Q. This means that Q is not itself seen as a formalization of topic maps; it is just a mechanism used for that formalization.

This paper assumes the reader is quite familiar with both topic maps and RDF, and so readers which are not may prefer to read an earlier paper on this subject [Garshol03].

Goals

There have been several different, and at times conflicting, goals for the work on the Q model:

  • To find a simple, formal model for representing topic maps that is suitable for topic maps research in general, and especially for formally defining tolog and OSL.
  • To find a unifying model for topic maps and RDF.
  • To find a simple and efficient representation of topic maps that can be used as the basis for topic maps implementations.

In this paper, the emphasis is very much on finding a unifying model for topic maps and RDF, and a paper that had a different focus might have produced a different representation of topic maps.

Informal overview

Modelling topic maps in RDF would solve the RDF/TM interoperability problem at a stroke, and several attempts at this have been made. The most notable are [Garshol02] and [Cregan05], but neither can really be said to have achieved a natural result, in the sense that the RDF produced is wholly unlike the natural RDF expression of the topic map information.

The problem is best explained with an example. Let's say we want to assert that my name is "Lars Marius Garshol", and that the sort form of this name is "garshol, lars marius". The most natural expression of this in RDF would be:

Figure 1
(lmg, foaf:name, "Lars Marius Garshol")
(lmg, :sort-name, "garshol, lars marius")

In topic maps this would be represented by a topic name with the value "LMG" attached to the topic representing me, and a variant of this topic name with the value "g, ml". If we represent this directly in RDF using a straightforward object mapping we get:

Figure 2
(lmg, TOPIC_NAME, tn)
(tn, VALUE, "Lars Marius Garshol")
(tn, VARIANT, vn)
(vn, VALUE, "garshol, lars marius")
(vn, SCOPE, sort-topic)

This is very heavy-weight, and also entirely unlike the RDF expression of the same information. The reason for the difference is that we need to be able to express 5 things about the topic name: its string value, its scope, its variants, its type, and its reifiers. This leaves us with two alternatives in RDF: creating a blank node for the topic name, or using RDF reification. The alternatives are both too heavyweight for real use, especially when applied to associations.

One way to solve this would be using four elements instead of three. A model instance would then be a set of quadruples (or quads) reminiscent of RDF triples, of the form (subject, predicate, identity, object), where the identity can be used as the subject in other quads to make statements about the quad. This allows us to restate the example above as:

Figure 3
(lmg, TOPIC_NAME, tn, "Lars Marius Garshol")
(tn, VARIANT, vn, "garshol, lars marius")
(vn, SCOPE, x, sort-topic)

This may not look like a very big saving, but in the normal case where there are no variant names it cuts the number of statements required in half, and for other parts of the model the savings are even greater. Further, this is much closer to the natural RDF expression, making an alignment much more likely. However, as will be seen, some problems remain.

Related work

Substantial amounts of work have already been done both on formalizing topic maps and on the relationship between topic maps and RDF.

The primary work on the formalization of topic maps is what is now becoming ISO 13250-5: Topic Maps — Reference Model [TMRM], based on earlier work stretching back to late 2000. This work is now being merged with Robert Barta's formal Tau model for topic maps [Barta05].

Among other work on this front must be mentioned Dmitry Bogachev's BAM [Bogachev04], and Graham Moore's unpublished work on quads (similar to the work presented in this paper, though as it is unpublished the full degree of the similarity is not known). Finally, there is [CWEB], about which this author knows only that it uses quads to represent topic maps and RDF.

Work has also been done on quad representations of RDF, but for the most part the fourth element has been used to represent the "context" of the statement. The interpretation of this fourth element has varied from the RDF document the statement came from to the provenance of the statement [Khriyenko05][Tolle04]. There are also quad representations of RDF where the fourth element represents the statement identity, but no references to such work appear to exist [Carroll05].

As for previous work on the relationship between topic maps and RDF the reader is referred to the survey of such work found in [RDFTM].

A naïve approach

In this section we are going to make a first, naïve, attempt at defining the Q model and representing topic maps and RDF in it. To be more specific, the goals are to define a formal model (Q), and representations in it of RDF and topic maps such that:

  • Any TMDM instance can be transformed into a Q instance, and that this Q instance can be transformed back to an equal TMDM instance with no loss of information.
  • Any RDF model can be transformed into a Q instance, and that this Q instance can be transformed back to an equivalent RDF model with no loss of information.
  • Any Q instance created from RDF or topic maps can be transformed to both RDF and topic maps, even if this may require some annotation to be added to the Q instance.
  • The tolog topic maps query language and the SPARQL RDF query language can be defined directly on top of Q, in such a way that both can query topic maps and RDF Q instances.

This, as it turns out, is more difficult than it seems. For ease of exposition we will start with a straightforward attempt, and then discuss the problems with it, before presenting a better solution.

The core model

In this section we make a first attempt, which we call Q4, at defining the model.

Let I be the set of all identifiers, where an identifier is an object that is completely opaque and has no other properties than being distinct from all other identifiers. Let L be the set of all literals, where a literal is an atomic value such as a string or a number. Let A be the set of all atoms, where A = I ∪ L. A model is a subset of the set (I x I x I x A).

In other words, a model is a set of four-tuples, or "quads", for short. The meaning of a quad can be thought of as

(subject, predicate, identity, object)
which is highly similar to RDF triples. The difference is that the third element of one quad can be used in the first position of another to efficiently make statements about the first quad. This, as will be seen, is the key to being able to efficiently represent topic maps.

A valid model M further meets the following constraints:

  • No pair of quads (a, b, c, d) and (w, x, y, z) in M can exist such that (a ≠ w or b ≠ x or d ≠ z) and c = y. (Or, more informally, the third element in each quad must be unique within the model.)
  • No pair of quads (a, b, c, d) and (w, x, y, z) in M can exist such that a = w and b = x and d = z and c ≠ y. (Or, more informally, the same quad cannot appear twice with different identifiers.)
  • No pair of quads (a, b, c, d) and (w, x, y, z) in M can exist such that b = y. (Or, more informally, the identity of a quad cannot be used as a predicate.)

Operations

We define a set of operations on the model that will allow us to work with the model more conveniently. These operations will be used in the transformation from Q4 to TMDM.

The subscript operator, written as a [n] postfix, extracts the nth element in a quad. This allows us to define the following functions for accessing the components of quads:

subj(q) = q[1]
pred(q) = q[2]
id(q)   = q[3]
val(q)  = q[4]

A filtering function ɸ which produces a subset of a model matching a certain pattern is defined as follows:

ɸ(M, w, x, y, z)
  = {q ∈ M | (w=* or subj(q)=w) and
     (x=* or pred(q)=x) and
     (y=* or id(q)=y) and
     (z=* or val(q)=z)}
The parameter * is a wildcard that matches any identifier, which greatly increases the flexibility and utility of the function.

A selection function σ selecting all atoms appearing in a given position in a given model is defined as:

σ(M, n)
  = {z ∈ A | ∃q ∈ M : q[n]=z}

A second filter function φ produces the subset of a model which has a particular subject and one of a set of properties. It is defined as:

φ(M, s, P)
  = {∀q ∈ M | subj(q) = s and pred(q) ∈ P}

Datatypes

RDF and topic maps both support datatyped literal values, and so Q needs to support the same. A datatype in Q is defined a quadruple (not part of the model) (u, i, i-1, V), where u is the URI identifying the datatype, V is the set of all values of this type, i is a function S → V (where S is the set of all strings), and i-1 is a function V → S. V must be disjoint with the value sets of all other datatypes, and it must hold that V ⊂ L. Further, for all v ∈ V it must hold that v = i(i-1(v)).

The function i and its inverse is the lexical-to-value mapping for the datatype, which is used to map the lexical (string) representation of values to the values themselves.

Finally, the function I(u) produces the interpretation function (i) given the URI of a datatype, while I'(v) produces the inverse (i-1) for the type to which v belongs. Note that the notation f(x)(y) is used to mean g(y) where g = f(x).

Representing topic maps in Q

This section shows how topic maps can be represented in Q4 by showing how TMDM instances can be transformed into a Q4 representation, and how Q4 instances using the same Q vocabulary can be transformed into TMDM instances. This demonstrates that the representation loses no information, since full roundtrips of all information is possible. Throughout this section identifiers written in UPPER_CASE are used to denote identifiers from a Q vocabulary used to represent TMDM. The symbol _ is used to denote a quad identifier which will not be referenced further.

Note, however, that the representation of topic maps shown in this section has a number of problems, as hinted earlier. These are discussed in “Differences and problems”.

Transforming TMDM to Q4

To transform a TMDM instance to Q4, start with the empty model. Then let the topic map item be tm, and

  • for each value l in tm.[item identifiers] add the quad (tm, ITEM_IDENTIFIER, _, l),
  • for each value t in tm.[topics], follow the procedure for topic items below, and
  • for each value a in tm.[associations], follow the procedure for association items below.

The procedure for any topic item t is:

  • add the quad (tm, TOPIC, _, t),
  • for each value n in t.[topic names] add the quad (t, n.[type], n, n.[value]); then follow the procedure for topic name items below,
  • for each value o in t.[occurrences] add the quad (t, o.[type], o, I(o.[datatype])(o.[value])); then follow the procedure for occurrence items below,
  • if there is a value in t.[reified] add the quad (t, REIFIES, _, t.[reified]),
  • for each value l in t.[subject identifiers] add the quad (t, SUBJECT_IDENTIFIER, _, l),
  • for each value l in t.[subject locators] add the quad (t, SUBJECT_LOCATOR, _, l), and
  • for each value l in t.[item identifiers] add the quad (t, ITEM_IDENTIFIER, _, l).

The procedure for any topic name item n is:

  • add the quad (n.[type], META_TYPE, _, TOPIC_NAME),
  • for each value t in n.[scope] add the quad (n, SCOPE, _, t),
  • for each value v in n.[variants] add the quad (n, VARIANT, v, I(v.[datatype])(v.[value])), then follow the procedure for variant items below, and
  • for each value l in n.[item identifiers] add the quad (n, ITEM_IDENTIFIER, _, l).

The procedure for any variant item v is:

  • for each value t in v.[scope] add the quad (v, SCOPE, _, t),
  • for each value l in v.[item identifiers] add the quad (v, ITEM_IDENTIFIER, _, l).

The procedure for any occurrence item o is:

  • add the quad (o.[type], META_TYPE, _, OCCURRENCE),
  • for each value t in o.[scope] add the quad (o, SCOPE, _, t),
  • for each value l in o.[item identifiers] add the quad (o, ITEM_IDENTIFIER, _, l).

The procedure for any association item a is:

  • add the quad (tm, ASSOCIATION, _, a),
  • add the quad (a, TYPE, _, a.[type]),
  • for each value t in a.[scope] add the quad (a, SCOPE, _, t),
  • for each value r in a.[roles] add the quad (a, r.[type], r, r.[player]), then follow the procedure for association roles below, and
  • for each value l in a.[item identifiers] add the quad (a, ITEM_IDENTIFIER, _, l).

The procedure for any association role item r is:

  • for each value l in r.[item identifiers] add the quad (r, ITEM_IDENTIFIER, _, l).

Transforming Q4 to TMDM

For transforming a Q4 model to TMDM to be possible the Q4 model needs to use the Q vocabulary for representing TMDM defined in the previous section. In the following we will assume this without further comment. The transformation is robust in the presence of additional non-TMDM information.

To transform a Q4 model M to TMDM, the first step is to do a little analysis on the model in order to know which predicates are name predicates, and which are occurrence predicates. This is done as follows:

N = σ(ɸ(M, *, META_TYPE, *, TOPIC_NAME), 1)
O = σ(ɸ(M, *, META_TYPE, *, OCCURRENCE), 1) 

Topic map item

Now we can find the identity of the topic map. This is done with

tm = σ(ɸ(M, *, TOPIC, *, *), 1) 
We can then construct the topic map item as follows:

Figure 4
tm.[item identifiers] = σ(ɸ(M, tm, ITEM_IDENTIFIER, *, *), 4)
tm.[topics]           = σ(ɸ(M, tm, TOPIC, *, *), 4)
tm.[associations]     = σ(ɸ(M, tm, ASSOCIATION, *, *), 4) 
Topic items

For each element t in

σ(ɸ(M, *, TOPIC, *, *), 4)
create a topic item as follows:

Figure 5
t.[topic names]         = σ(φ(M, t, N), 4)
t.[occurrences]         = σ(φ(M, t, O), 4)
t.[subject identifiers] = σ(ɸ(M, t, SUBJECT_IDENTIFIER, *, *), 4)
t.[subject locators]    = σ(ɸ(M, t, SUBJECT_LOCATOR, *, *), 4)
t.[item identifiers]    = σ(ɸ(M, t, ITEM_IDENTIFIER, *, *), 4)
t.[parent]              = tm
Topic name items

For each element n in

σ(φ(M, t, N), 4)
create a topic name item as follows:

Figure 6
n.[value]            = σ(ɸ(M, t, *, n, *), 4)
n.[type]             = σ(ɸ(M, t, *, n, *), 2)
n.[scope]            = σ(ɸ(M, n, SCOPE, *, *), 4)
n.[variants]         = σ(ɸ(M, n, VARIANT, *, *), 4)
n.[item identifiers] = σ(ɸ(M, n, ITEM_IDENTIFIER, *, *), 4)
n.[parent]           = t
Variant items

For each element v in

σ(ɸ(M, n, VARIANT, *, *), 4)
create a variant item as follows:

Figure 7
v.[value]            = σ(ɸ(M, n, *, v, *), 4)
v.[scope]            = σ(ɸ(M, v, SCOPE, *, *), 4)
v.[item identifiers] = σ(ɸ(M, v, ITEM_IDENTIFIER, *, *), 4)
v.[parent]           = n
Occurrence items

For each element o in

σ(φ(M, t, O), 4)
create an occurrence item as follows:

Figure 8
o.[value]            = σ(ɸ(M, t, *, o, *), 4)
o.[type]             = σ(ɸ(M, t, *, o, *), 2)
o.[scope]            = σ(ɸ(M, o, SCOPE, *, *), 4)
o.[item identifiers] = σ(ɸ(M, o, ITEM_IDENTIFIER, *, *), 4)
o.[parent]           = t
Association items

For each element a in

σ(ɸ(M, *, ASSOCIATION, *, *), 4)
create an association item as follows:

Figure 9
M' = ɸ(M, a, *, *, *) - φ(M, a, {ITEM_IDENTIFIER, TYPE, SCOPE})

a.[type]             = σ(ɸ(M, a, TYPE, *, *), 4)
a.[scope]            = σ(ɸ(M, a, SCOPE, *, *), 4)
a.[roles]            = σ(M', 3)
a.[item identifiers] = σ(ɸ(M, a, ITEM_IDENTIFIER, *, *), 4)
a.[parent]           = tm
Assocation role items

For each (a, p, r, t) in M' create an association role item as follows:

Figure 10
r.[player]           = t
r.[type]             = p
r.[item identifiers] = σ(ɸ(M, r, ITEM_IDENTIFIER, *, *), 4)
r.[parent]           = a

Q4 and RDF

One might think that representing RDF in Q4 is straightforward, but there is actually a difficulty: URIs are identifiers in RDF, but literals in Q. This means that the most obvious representation is not available. However, representation is still quite straightforward.

We define a function i(n) which, given an RDF URI reference or a blank node, produces an identifier for it. The function meets the constraint that if n ≠ m then i(n) ≠ i(m). Given this we can define a mapping from an RDF model to Q.

Given an RDF model M, start with an empty Q model, then:

  • for each triple (s, p, o) in M where o is not a literal add the quad (i(s), i(p), _, i(o)),
  • for each triple (s, p, o) in M where o is a literal add the quad (i(s), i(p), x, v), where v = o if o is not typed, and v= I(u)(o) if o is typed, where u is the URI of its datatype; if o has a language tag l, add the quad (x, LANGUAGE_TAG, _, l), and
  • for each RDF URI reference u in M add the quad (i(u), RDF_URI, _, u).

The final step is to add explicit reification indicators. This could be done either by using the identity of the reified statement as the subject of statements about it, or by using a quad to connect the reified quad with the identifiers reifying it. The first approach loses information, however, as there may be multiple identifiers reifying the same statement, and so we must follow the second. The procedure is as follows, where M is the Q4 model produced by the first step:

Figure 11: Adding reification quads
RDF_TYPE      = σ(ɸ(M, *, RDF_URI, *, rdf:type), 1)
RDF_STATEMENT = σ(ɸ(M, *, RDF_URI, *, rdf:Statement), 1)
RDF_SUBJECT   = σ(ɸ(M, *, RDF_URI, *, rdf:subject), 1)
RDF_PREDICATE = σ(ɸ(M, *, RDF_URI, *, rdf:predicate), 1)
RDF_OBJECT    = σ(ɸ(M, *, RDF_URI, *, rdf:object), 1)

R             = σ(ɸ(M, *, RDF_TYPE, *, RDF_STATEMENT), 1)
                ∩ σ(ɸ(M, *, RDF_SUBJECT, *, *), 1)
                ∩ σ(ɸ(M, *, RDF_PREDICATE, *, *), 1)
                ∩ σ(ɸ(M, *, RDF_OBJECT, *, *), 1) 

Finally, for each r in R, add a quad as follows:

Figure 12
s = σ(ɸ(M, r, RDF_SUBJECT, *, *), 1))
p = σ(ɸ(M, r, RDF_PREDICATE, *, *), 2))
o = σ(ɸ(M, r, RDF_OBJECT, *, *), 4))

(r, REIFIES, _, σ(ɸ(M, s, p, *, o), 3))

The existence and form of an inverse transformation should be obvious.

This representation of RDF is very similar to the quads-with-statement-id representation of RDF, but differs in that URIs are not allowed in the first three positions. This difference was motivated by the goal of aligning the RDF representation with that of topic maps. Note that while this does not explicitly support representing context, context is easily supported through a special property that can be used in another quad with the first as the subject.

Differences and problems

As hinted earlier the modelling of topic maps and RDF in the previous section has a number of problems. The basic problem is that the topic map and RDF representations of the same information are quite different. It's not possible to simply view a topic map in Q4 as RDF or vice versa and get a reasonable result, which reduces the value of Q4 as a model.

The problems with the representations described above are:

  • Binary associations have a different structure in topic maps (one identifier and three quads) from in RDF (one quad).
  • RDF language tags do not fit into the topic maps model as currently represented.
  • Reification in RDF and reification in topic maps do not match.
  • The META_TYPE quads do not make sense in RDF, and, conversely, these do not exist in RDF-in-Q.
  • In the TMDM representation it is possible to get quads which only differ in the identity part from topic names, occurrences, or variants which have the same parent, type, and value, but whose scopes vary. This violates one of the fundamental constraints on the Q model. (The same actually applies to the RDF representation, because of the language tags.)
  • The handling of identifying URIs is different (three properties in topic maps, one in RDF).

We will go through the problems one by one, and see what can be done to resolve them.

Associations

Associations in topic maps can be divided into three groups: unary, binary, and n-ary, and each group has its own problems. N-ary associations are represented the same way in both models, but in RDF there is no way to know which nodes represent associations. This is part of a more general problem, and so we defer this to section “META_TYPE”.

Binary associations, on the other hand, are represented as a single quad in RDF, but become three quads in topic maps. This can be solved by creating special "association template" nodes which represent a particular combination of association type and association role types. In the original representation, the association stating that I am employed by Ontopia would look as follows:

Figure 13
(assoc, employee, _, lmg)
(assoc, TYPE, _, employed-by)
(assoc, employer, _, ontopia)

Using association template nodes, the representation would be:

Figure 14
(lmg, template, _, ontopia)
(template, TYPE, _, employed-by)
(template, SUBJECT_ROLE, _, employee)
(template, OBJECT_ROLE, _, employer)

It is relatively simple to modify the model-building procedure to create and use templates for binary associations instead of the current representation. The benefit is that this aligns the RDF and topic maps representations, and that the topic maps representation becomes considerably more compact. The original used n*3 quads for n binary associations following the same template while the modified representation uses n+3 quads.

Note that there is one weakness with this approach: in TMDM it is possible to reify association roles, but in this representation of binary associations that is no longer possible. The association as a whole can be reified, but not the roles. This can be worked around, using an approach similar to that taken for RDF reification. This is, to put it mildly, ugly, but on the other hand, this is an extremely rarely used feature. In fact, this author does not know of a single example where this has been done1.

Unary associations are less straightforward. The general consensus is that these represent assertions of simple facts, like "this case is closed", "this company is bankrupt", "this article is a draft", and so on. This would become two quads in the current representation:

(assoc, TYPE, _, is-bankrupt)
(assoc, company, _, barings-bank
which would seem very odd as an RDF representation of this information. A more natural representation would be
(barings-bank, is-bankrupt, _, company)
and so this is what we will use.

Language tags

The natural representation of RDF language tags is as scope in a topic map, since the language tag specifies a context in which the literal is valid. However, in RDF the language tags are strings, while scope in a topic map is composed of topics. This can be overcome by creating identifiers for the language tags, and giving them RDF_URIs of the form http://psi.ontopia.net/rfc-3066/xxx. This means there is a way to view the RDF data as topic maps (and vice versa), and also a simple transformation to and from RDF.

Reification

Reification seems on the surface not cause any problems, but there turns out to be a subtle difference between topic maps and RDF here. In topic maps reification is a direct connection between the reified construct and the reifying topic, which makes for efficient implementation. In RDF the connection is indirect (in the standard model, at least; implementations may and do differ), and one feature of the indirect connection is that it is possible to make statements about a statement without asserting it. That is, it is possible to assert that "person X claims that Y is of type Z" without actually asserting that "Y is of type Z". This is done by creating a blank node for the statement, and asserting that person X claims this, without actually including the triple asserting "Y is of type Z". This is not possible in topic maps, and this needs to be catered for.

In Q it is most natural to represent reification not as in the RDF model, but using the identity of the quad representing the reified statement. This is the only way to represent reification in topic maps, so clearly this is the approach that must be taken. The solution is to, when the statement is not asserted, not add the REIFIES quad. This preserves the fact that the statement is unasserted and has a natural interpretation in topic maps2.

META_TYPE

There are two related issues with META_TYPE. Firstly, how can RDF data be viewed as topic maps when the META_TYPE quads are missing from the RDF data? Secondly, should these quads be included in the RDF view of a topic map, and if so, how?

The META_TYPE quads must be added to the RDF data in some way, in order for the data to be viewable as a topic map. This can be done either by a human being, or it can be done automatically by software using heuristics and other methods. In either case, the issue of how to add the missing information to RDF data is outside the scope of this paper. It is enough to know that in the problem is solvable, since in the last resort human beings really can do this.

As the META_TYPE quads are statements about the name and occurrence types in the topic map they can be expressed as ordinary RDF triples, and since this information is necessary in order to be able to return to topic maps, it should be preserved. To define an RDF META_TYPE property would be sufficient.

Duplicates

The handling of duplicates in both RDF and topic maps is an issue for the Q4 model. The following topic map (in LTM syntax) causes difficulties:

Figure 15
[fish = "Fish" = "Fisk" / norwegian = "Fisk" / swedish]

This turns into the following in the Q4 representation:

Figure 16
(fish, TOPIC_NAME, _, "Fish")
(fish, TOPIC_NAME, s1, "Fisk")
(s1, SCOPE, _, norwegian)
(fish, TOPIC_NAME, s2, "Fisk")
(s2, SCOPE, _, swedish)

This, however, violates the constraint that two quads that are different only in the ID part are not allowed. The same information expressed in RDF with the skos:prefLabel property and language tags would have the same structure and cause the same difficulty (replace TOPIC_NAME with skos:prefLabel and norwegian/swedish with the correct RFC 3066-based URIs).

There are two obvious ways to approach this problem, each of which has its own problems. One approach is to assert the duplicate quads only once. The RDF language tags can then be asserted with the same quad as the subject. This does not work for scope in topic maps, however, since scope is a set, and this would lose information about the set boundaries. This can be solved by creating one identifier for each unique scope set in the topic map, and then relating the scoped quads to the scope identifiers for their sets. This has the added advantage of being a much more compact representation than the naïve representation.

Unfortunately, even with this modification the representations are lossy, because in both RDF and topic maps it is possible to reify the qualified statements explicitly. That is, statements can be made about "fish is called 'fisk' in Norwegian" separately from the statement that "fish is called 'fisk' in Swedish". In theory this could be worked around by making the reification apply to the qualifying quads instead of the base quad, but this would be a departure from normal reification, and as such complicate the representation substantially. That, again, could be worked around by saying that every quad representing a statement must have a scope, even if that is the unconstrained scope. This would be unacceptable in an implementation, but is certainly workable in a formal model.

The alternative is to simply remove the constraint from the Q model. This allows the most natural and straightforward expression of the information, at the cost of some loss in compactness. Another cost is that this means the removal of duplicate information becomes much harder, since if (x, y, _, z) is now added twice, it cannot be collapsed into a single quad, as one of the two may later be qualified and reified and thus need to be preserved as a separate entity. This entails a severe loss in efficiency in real implementations, and so means that this approach is not feasible for an implementation.

The third, less obvious, alternative is to shift to a quintuple representation of the form (s, p, i, c, o), where the "c" is the context. Context could then be used for the scope identifier in topic maps, and for the language tag when representing RDF. This would solve at a stroke the problems with duplicates and give a simple reification representation. Another good thing is that this solution is equally compatible with the name "Q model". The downside, of course, is that quintuples are getting quite large compared to the original triples of RDF, and for this reason the model using quints is likely to meet additional resistance.

Identifying URIs

This is clearly the thorniest problem in creating a common representation for RDF and topic maps. In RDF a URI can identify a resource without any consideration being given to whether the node having the URI represents the result of resolving the URI, or whether the URI merely identifies some abstract thing that is not network-retrievable [Pepper03]. In topic maps, on the other hand, in the first case the URI is considered a "subject locator", and in the latter it is a "subject identifier". This is reflected in the naïve version of the Q model, where URIs are attached to identifiers with RDF_URI, and with SUBJECT_IDENTIFIER or SUBJECT_LOCATOR in topic maps.

This, of course, is the problem. The representation of the identifying URI is different in the Q representations of topic maps and RDF, which means that it's not possible to work with the model without knowing whether one is working with RDF or topic maps. In other words, it fails the goal of creating a common representation for topic maps and RDF.

The solution to the problem must be informed by the semantics of the subject identifier/locator distinction. So, what does the designation of a URI as the subject locator for a topic tell us? Basically, it tells us two things:

  • That the subject of the topic is an information resource. This is inherent in the semantics of the construct, since if something can be retrieved over the network, it is by definition an information resource. Note that "information resource" is actually a class of things, so in one sense this is indirectly providing type information. In RDFS one would state this as (tm:subject-indicator, rdfs:domain, InformationResource).
  • That this particular URI has the property that resolving it is going to give us the subject represented by the topic.

Assigning a URI as the subject identifier of a topic has slightly different semantics. It does not tell us anything about the class of the subject (information resources can have subject identifiers, even if they are unlikely to), but it does tell us that what we get if we resolve the URI is a resource describing or indicating the subject. We do not get the subject itself by resolving the URI.

>From this we can conclude that if we know an RDF resource is an instance of a non-information resource class, then the URI must be a subject identifier. It also appears safe to assume that if an RDF resource is known to be an instance of the information resource class the URI is its subject locator.

Theoretically, the resolution to TAG issue http-range-14 [http-range-14] provides another way to make the distinction for http URIs, which is through resolving the URIs and making the distinction based on the HTTP response code. However, in practice, this is not feasible, since processing large numbers of resources will require large amounts of time, and in most cases the findings will be inconclusive. Another problem is that current practice does not conform with the TAG's issue resolution. So in practice this is not an alternative.

The basic problem here is that what is represented with one property in RDF is represented with two in topic maps. In order to achieve the goal of having a single representation for for both models, either both must use a single property, or both must use two properties.

Let's consider the use of a two properties first. This is of course unproblematic for topic maps, but for RDF it presents considerable difficulties. In order to know which property to use for the URI of a given resource we need to know if that resource is an information resource, but this type information is unlikely to be available in the RDF data. Some inferences can be made, since RDF properties and classes cannot be information resources, but even so, for most resources it will be impossible to tell whether or not it is an information resource.

Having dismissed the use of two properties, let's consider the use of a single property, say NODE_URI. Using a single property is already done in the naïve RDF representation, but if this is done in the topic maps representation as well it means that the subject locator/identifier distinction must be preserved by other means, and that this mechanism must also be available in the RDF representation. The following alternatives exist:

  • Add another quad about each NODE_URI quad that makes it clear whether this is a subject locator or subject identifier. This has the benefit that it preserves the distinction for each URI assignment individually. The downside is that it is by no means compact, and may require massive changes to RDF data sets in order to add the missing information.
  • Define a class "information resource" and assume that all NODE_URIs of instances of this class are subject locators, whereas NODE_URIs of anything that is not an instance of this class are subject identifiers. This is loses some information, since it's conceivable that the same identifier will have URIs of both types. In order to do this it is necessary to be able to distinguish between types that are really asserted in the source data, and types that are asserted only in order to clarify the interpretation of NODE_URI quads. One way to do this might be to define a particular scope which holds model-level information.
  • Use META_TYPE to make it clear for each identifier separately whether or not it represents an information resource. This is similar to the previous alternative, but doesn't require any special solution in order to make it clear that the information is not from the source data.

On balance, using a single property, defining a special class, and using scope to separate explicitly stated quads from quads that are artifacts of the translation process seems the best alternative.

A unified model

Having presented the naïve approach and the problems with that approach, we are now ready to present the real Q model, which is considerably more involved than the simple Q4 model presented above.

The core Q model

I, L, and A are as before, as is the division of L into datatypes, and the operations on datatypes. A model is a subset of the set (I x I x I x I x A); that is, we are now using quints.

The subscript operator, written as a [n] postfix, extracts the nth element in a quint. This allows us to define the following functions for accessing the components of quads:

subj(q) = q[1]
pred(q) = q[2]
id(q)   = q[3]
con(q)  = q[4]
val(q)  = q[5]

A filtering function ɸ which produces a subset of a model matching a certain pattern is defined as follows:

ɸ(M, v, w, x, y, z)
  = {q ∈ M | (v=* or subj(q)=v) and
     (w=* or pred(q)=w) and
     (x=* or id(q)=x) and
     (y=* or con(q)=y) and
     (z=* or val(q)=z)}

A selection function σ selecting all identifiers appearing in a given position in a given model is defined as:

σ(M, n)
  = {z ∈ A | ∃q ∈ M : q[n]=z}

A second filter function φ produces the subset of a model which has a particular subject and one of a set of properties. It is defined as:

φ(M, s, P)
  = {∀q ∈ M | subj(q) = s and pred(q) ∈ P}

Having defined the operations, we can more easily define the constraints on valid model instances, which are:

  • For each p ∈ M it must hold that ɸ(p[1], p[2], *, p[4], p[5]) = {p}. (Or, more informally, the id element in each quint must be unique within the model.)
  • For each p ∈ M it must hold that ɸ(*, *, p[3], *, *) = {p}. (Or, more informally, the same quint cannot appear twice with different identifiers.)
  • σ(M, 2) ∩ σ(M, 3) = ∅. (Or, more informally, the identity of a quint cannot be used as a property.)
  • σ(M, 4) ∩ σ(M, 3) = ∅. (Or, more informally, the identity of a quint cannot be used as a context.)
  • σ(M, 4) ∩ σ(M, 2) = ∅. (Or, more informally, the context of a quint cannot be used as a property.)

Finally, the function id-1 finds the quint with a given identity in the model:

id⁻¹(M, i) = q ∈ M | id(q) = i

Transforming RDF to Q

Let i be the same function as before, and it(t) a function which produces a unique identifier for each language tag. Given an RDF model M, start with an empty Q model, then:

  • for each language tag t used in M add the quints (c, SCOPE_MEMBER, _, Q, it(t)), and (it(t), NODE_URI, _, Q, <http://psi.ontopia.net/rfc-3066/t>).
  • for each triple (s, p, o) in M where o is not a literal add the quint (i(s), i(p), _, U, i(o)),
  • for each triple (s, p, o) in M where o is a typed literal add the quint (i(s), i(p), _, U, v), where v = o if o is not typed, and v= I(u)(o) if o is typed, where u is the URI of its datatype,
  • for each triple (s, p, o) in M where o is an untyped literal, add the quint (i(s), i(p), x, c, i(o)); if o has a language tag t and M' is the current model, c = σ(ɸ(M, *, SCOPE_MEMBER, *, *, it(t)), 1), while if o has no language tag c = U, and
  • for each RDF URI reference u in M add the quint (i(u), NODE_URI, _, U, u).

Again we need to add reification quads:

Figure 17: Adding reification quads
RDF_TYPE           = σ(ɸ(M, *, NODE_URI, *, *, rdf:type), 1)
RDF_STATEMENT      = σ(ɸ(M, *, NODE_URI, *, *, rdf:Statement), 1)
RDF_SUBJECT        = σ(ɸ(M, *, NODE_URI, *, *, rdf:subject), 1)
RDF_PREDICATE      = σ(ɸ(M, *, NODE_URI, *, *, rdf:predicate), 1)
RDF_OBJECT         = σ(ɸ(M, *, NODE_URI, *, *, rdf:object), 1)

R                  = σ(ɸ(M, *, RDF_TYPE, *, *, RDF_STATEMENT), 1)
                     ∩ σ(ɸ(M, *, RDF_SUBJECT, *, *, *), 1)
                     ∩ σ(ɸ(M, *, RDF_PREDICATE, *, *, *), 1)
                     ∩ σ(ɸ(M, *, RDF_OBJECT, *, *, *), 1) 

Finally, for each r in R, let

s = σ(ɸ(M, r, RDF_SUBJECT, *, *), 1))
p = σ(ɸ(M, r, RDF_PREDICATE, *, *), 2))
o = σ(ɸ(M, r, RDF_OBJECT, *, *), 4))
q = σ(ɸ(M, s, p, *, *, o), 3) 
and if q is non-empty add the following quint:
(r, REIFIES, _, Q, q)

An RDF example

To demonstrate how the transformation works, let's convert the RDF test case in [RDFTM] to Q, using the above transformation. The test case, in n3 syntax, is:

Figure 18: The RDF2TM test case
@prefix music: <http://psi.ontopia.net/music/#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

[ rdf:type music:opera;
rdfs:label "Tosca";
music:premiere-date "1900-01-14";
music:synopsis <http://www.azopera.com/learn/synopsis/tosca.shtml>; music:composed-by [
rdf:type music:person;
rdfs:label "Giacomo Puccini" ]
] .

# ---------------------------------------

music:person	rdfs:label "Person" .
music:opera	rdfs:label "Opera" .

music:composed-by rdfs:label "Composed by" .
music:premiere-date rdfs:label "Première date" .
music:synopsis	rdfs:label "Synopsis" .

If we apply the transformation above to this, the result is as shown below. The identifiers "tosca" and "puccini" are used for those two blank nodes, respectively, and for nodes which have URIs the identifiers used are the qnames for those URIs, with the colon replaced by an underscore.

Figure 19: Q representation of RDF2TM test case
(tosca, rdf_type, _, U, music_opera)
(tosca, rdfs_label, _, U, "Tosca")
(tosca, music_premiere-date, _, U, "1900-01-14")
(tosca, music_synopsis, _, U, <http://www.azopera.com/learn/synopsis/tosca.shtml>)
(tosca, music_composed-by, _, U, puccini)
(puccini, rdf_type, _, U, music_person)
(puccini, rdfs_label, _, U, "Giacomo Puccini")
(rdf_type, NODE_URI, _, U, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>)
(music_opera, NODE_URI, _, U, <http://psi.ontopia.net/music/#opera>)
(rdfs_label, NODE_URI, _, U, <http://www.w3.org/2000/01/rdf-schema#label>) 

The final NODE_URI and rdfs_label quints are left out, since they become rather repetitive.

Transforming Q to RDF

To transform a Q model M into RDF, let i-1 be the inverse of the function i defined previously, and

M' = M - ɸ(M, *, *, *, Q, *)
then, for each q in M' where val(q) is a literal add the following triple to the RDF model: (i-1(subj(q)), i-1(pred(q)), i-1(I'(val(q))(val(q)))). Repeat the triple once with each language tag in σ(ɸ(M, *, *, *, Q, *), 5).

For each q in M' where val(q) is an identifier add the following triple to the RDF model (i-1(subj(q)), i-1(pred(q)), i-1(val(q))).

Transforming TMDM to Q

To transform a TMDM instance to Q, start with the empty model. Let si(x) be a function which, given a set of topic items, produces an identifier in I. Let U = si(∅), and Q the scope of model-level information. For each set s of topic items in a [scope] property in the source topic map, add the following quint for each topic item t in the set: (si(s), SCOPE_MEMBER, _, Q, t).

Let the topic map item be tm, and

  • for each value l in tm.[item identifiers] add the quint (tm, ITEM_IDENTIFIER, _, U, l), and
  • for each value t in tm.[topics], follow the procedure for topic items below

The procedure for any topic item t is:

  • add the quint (tm, TOPIC, _, Q, t),
  • for each value n in t.[topic names] add the quint (t, n.[type], n, si(n.[scope]), n.[value]); then follow the procedure for topic name items below,
  • for each value o in t.[occurrences] add the quint (t, o.[type], o, si(o.[scope]), I(o.[datatype])(o.[value])); then follow the procedure for occurrence items below,
  • if there is a value in t.[reified] add the quint (t, REIFIES, _, Q, t.[reified]),
  • for each value l in t.[subject identifiers] add the quint (t, NODE_URI, _, Q, l),
  • for each value l in t.[subject locators] add the quint (t, NODE_URI, _, Q, l),
  • if t.[subject locators] is empty, add the quint (t, TYPE_INSTANCE, _, Q, INFORMATION_RESOURCE), and
  • for each value l in t.[item identifiers] add the quint (t, ITEM_IDENTIFIER, _, Q, l).

The procedure for any topic name item n is:

  • add the quint (n.[type], META_TYPE, _, Q, TOPIC_NAME),
  • for each value v in n.[variants] add the quint (n, VARIANT, v, si(v.[scope]), I(v.[datatype])(v.[value])), then follow the procedure for variant items below, and
  • for each value l in n.[item identifiers] add the quint (n, ITEM_IDENTIFIER, _, Q, l).

The procedure for any variant item v is:

  • for each value l in v.[item identifiers] add the quint (v, ITEM_IDENTIFIER, _, Q, l).

The procedure for any occurrence item o is:

  • add the quint (o.[type], META_TYPE, _, Q, OCCURRENCE),
  • for each value l in o.[item identifiers] add the quint (o, ITEM_IDENTIFIER, _, Q, l).

For all (at, rt1, rt2) where there exists an association item a and association role items r1 and r2 such that a.[roles] = {r1, r2} and a.[type] = at, r1.[type] = rt1 and r2.[type] = rt2, add the following quints:

Figure 20
(t, ASSOCIATION_TYPE, _, Q, at)
(t, SUBJECT_ROLE, _, Q, rt₁) 
(t, OBJECT_ROLE, _, Q, rt₂)

For all association items a such a.[type] = at and where there exist association role items r1 and r2 such that a.[roles] = {r1, r2} and a.[type] = at, r1.[type] = rt1 and r2.[type] = rt2, add the following quints: (r1.[player], t, ai, si(a.[scope]), r2.[player]), (tm, ASSOCIATION, _, Q, ai).

Finally, for all association items a which are not already processed,

  • add the quint (tm, ASSOCIATION, _, Q, a),
  • add the quint (a, TYPE, _, si(a.[scope]), a.[type]),
  • for each value r in a.[roles] add the quint (a, r.[type], r, si(a.[scope]), r.[player]), then for each value l in r.[item_identifiers] add the quint (r, ITEM_IDENTIFIER, _, U, l), and
  • for each value l in a.[item identifiers] add the quint (a, ITEM_IDENTIFIER, _, U, l).

A topic maps example

Again we use a test case from [RDFTM], but this time the TM2RDF test case, which is shown below in LTM syntax. It contains the same information, but has been rearranged slightly, in order to make the resulting Q model as similar to the one from the RDF2TM example as possible.

Figure 21: The TM2RDF test case
[tosca : opera = "Tosca"]
{tosca, premiere-date, [[1900-01-14]]}
{tosca, synopsis,	"http://www.azopera.com/learn/synopsis/tosca.shtml"}

composed-by( tosca : work, puccini : composer ) [puccini : person = "Giacomo Puccini"]

/* ------------------------------------- */

[person	  = "Person"	@"http://psi.ontopia.net/music/#person"]
[composer = "Composer"	@"http://psi.ontopia.net/music/#composer"]
[opera	  = "Opera"	@"http://psi.ontopia.net/music/#opera"]
[work	  = "Work"	@"http://psi.ontopia.net/music/#work"]

[premiere-date = "Première date" @"http://psi.ontopia.net/music/#premiere-date"]
[synopsis	   = "Synopsis"	@"http://psi.ontopia.net/music/#synopsis"]
[composed-by   = "Composed by" @"http://psi.ontopia.net/music/#composed-by"]

Transformed into Q, this becomes what is shown below. The IDs of the topics in the LTM file are used as their identifiers below to make the example easier to read. Note that the URIs in the tm namespace are guesses as to what URIs will be defined in the next version of TMDM.

Figure 22: Q representation of TM2RDF test case
(tosca, type, q1, U, opera)
(tosca, tm_name, _, U, "Tosca")
(tosca, premiere-date, _, U, "1900-01-14")
(tosca, synopsis, _, <http://www.azopera.com/learn/synopsis/tosca.shtml>)
(tosca, cby, _, U, puccini)
(puccini, type, _, U, person)
(puccini, tm_name, _, U, "Giacomo Puccini")

(tm_name, META_TYPE, _, Q, TOPIC_NAME)
(premiere-date, META_TYPE, _, Q, OCCURRENCE)
(synopsis, META_TYPE, _, Q, OCCURRENCE)
(type, ASSOCIATION_TYPE, _, Q, tm_type-instance)
(type, SUBJECT_TYPE, _, Q, tm_type)
(type, OBJECT_TYPE, _, Q, tm_instance)
(cby, ASSOCIATION_TYPE, _, Q, composed-by)
(cby, SUBJECT_TYPE, _, Q, work)
(cby, OBJECT_TYPE, _, Q, composer)
(tm, TOPIC, _, Q, puccini)
(tm, ASSOCIATION, _, Q, q1)
(tm_name, NODE_URI, _, U, <http://psi.topicmaps.com/iso-13250/name>)

Again, most of the NODE_URI, TOPIC, ASSOCIATION, ITEM_IDENTIFIER, and tm_name quints have been left out.

The main thing to note about this example is that if RT is the Q model of the RDF2TM example, TR the model of the TM2RDF example, and TR' = TR - ɸ(TR, *, *, *, Q, *), then RT and TR' have the same structure. The only differences are the ITEM_IDENTIFIER quints in TR' and that the values for NODE_URI are different. In other words, for this example, the Q model works, in the sense that it gives exactly the same representation.

This means that if the necessary TOPIC, ASSOCIATION, and META_TYPE quints are added to the Q representation of an RDF model it will effectively be a topic map.

Transforming Q to TMDM

To transform a Q model M to TMDM, the first step is to do a little analysis on the model in order to know which predicates are name predicates, occurrence predicates, and binary association predicates. This is done as follows:

N = σ(ɸ(M, *, META_TYPE, *, *, TOPIC_NAME), 1)
O = σ(ɸ(M, *, META_TYPE, *, *, OCCURRENCE), 1)
A = σ(ɸ(M, *, ASSOCIATION_TYPE, *, *, *), 1) 

Topic map item

Now we can find the identity of the topic map. This is done with

tm = σ(ɸ(M, *, TOPIC, *, *), 1) 
We can then construct the topic map item as follows:

Figure 23
tm.[item identifiers] = σ(ɸ(M, tm, ITEM_IDENTIFIER, *, *, *), 5)
tm.[topics]           = σ(ɸ(M, tm, TOPIC, *, *, *), 5)
tm.[associations]     = σ(ɸ(M, tm, ASSOCIATION, *, *, *), 5) 

Topic items

For each element t in

σ(ɸ(M, *, TOPIC, *, *, *), 5)
create a topic item as follows:

Figure 24
u = σ(ɸ(M, t, NODE_URI, *, *, *), 5)

t.[topic names]      = σ(φ(M, t, N), 5)
t.[occurrences]      = σ(φ(M, t, O), 5)
t.[item identifiers] = σ(ɸ(M, t, ITEM_IDENTIFIER, *, *, *), 5)
t.[parent]           = tm

Finally, if ɸ(M, t, TYPE_INSTANCE, *, *, INFORMATION_RESOURCE) is non-empty, set t.[subject locators] = u, otherwise set t.[subject identifiers] = u.

Topic name items

For each element n in

σ(φ(M, t, N), 5)
create a topic name item as follows:

Figure 25
n.[value]            = σ(ɸ(M, t, *, n, *, *), 5)
n.[type]             = σ(ɸ(M, t, *, n, *, *), 2)
n.[scope]            = σ(ɸ(M, con(id⁻¹(n)), SCOPE_MEMBER, *, *, *), 5)
n.[variants]         = σ(ɸ(M, n, VARIANT, *, *), 5)
n.[item identifiers] = σ(ɸ(M, n, ITEM_IDENTIFIER, *, *), 5)
n.[parent]           = t

Variant items

For each element v in

σ(ɸ(M, n, VARIANT, *, *), 5)
create a variant item as follows:

Figure 26
v.[value]            = σ(ɸ(M, n, *, v, *), 5)
v.[scope]            = σ(ɸ(M, con(id⁻¹(v)), SCOPE_MEMBER, *, *, *), 5)
v.[item identifiers] = σ(ɸ(M, v, ITEM_IDENTIFIER, *, *), 5)
v.[parent]           = n

Occurrence items

For each element o in

σ(φ(M, t, O), 5)
create an occurrence item as follows:

Figure 27
o.[value]            = σ(ɸ(M, t, *, o, *, *), 5)
o.[type]             = σ(ɸ(M, t, *, o, *, *), 2)
o.[scope]            = σ(ɸ(M, con(id⁻¹(o)), SCOPE_MEMBER, *, *, *), 5)
o.[item identifiers] = σ(ɸ(M, o, ITEM_IDENTIFIER, *, *), 4)
o.[parent]           = t

Association items

Let

A = σ(ɸ(M, *, ASSOCIATION, *, *), 5)
B = {q ∈ M | pred(q) ∈ σ(ɸ(M, *, ASSOCIATION_TYPE, *, *), 1)}
For each element a in A - B create an association item as follows:

Figure 28
M' = ɸ(M, a, *, *, *, *) - φ(M, a, {ITEM_IDENTIFIER, TYPE})
c  = e ∈ σ(ɸ(M, a, TYPE, *, *, *), 4)

a.[type]             = σ(ɸ(M, a, TYPE, *, *, *), 5)
a.[scope]            = σ(ɸ(M, c, SCOPE_MEMBER, *, *, *), 5)
a.[roles]            = σ(M', 3)
a.[item identifiers] = σ(ɸ(M, a, ITEM_IDENTIFIER, *, *, *), 5)
a.[parent]           = tm

For each r in ɸ(M', a, *, *, *, *) create an association role item as follows:

Figure 29
r.[player]           = val(r)
r.[type]             = pred(r)
r.[item identifiers] = σ(ɸ(M, id(r), ITEM_IDENTIFIER, *, *), 5)
r.[parent]           = a

For each element a in B create an association item and two association role items r1 and r2 as follows:

Figure 30
r₁.[player]           = subj(a)
r₁.[type]             = σ(ɸ(M, pred(a), SUBJECT_ROLE, *, *, *), 5)
r₁.[reifier]          = null
r₁.[item identifiers] = ∅
r₁.[parent]           = a

r₂.[player]           = val(a)
r₂.[type]             = σ(ɸ(M, pred(a), OBJECT_ROLE, *, *, *), 5)
r₂.[reifier]          = null
r₂.[item identifiers] = ∅
r₂.[parent]           = a

a.[type]             = σ(ɸ(M, pred(a), ASSOCIATION_TYPE, *, *, *), 5)
a.[scope]            = σ(ɸ(M, con(a), SCOPE_MEMBER, *, *, *), 5)
a.[roles]            = {r₁, ₂}
a.[reifier]          = σ(ɸ(M, *, REIFIES, *, *, id(a)), 1)
a.[item identifiers] = σ(ɸ(M, id(a), ITEM_IDENTIFIER, *, *, *), 5)
a.[parent]           = tm

Conclusion

This paper has presented the Q model and representations in it of topic maps and RDF which are structurally the same. The Q model is formal, compact, and adding individual statements can be made idempotent. This means that it is possible to create efficient Q engines which are topic map and RDF implementations at the same time.

This paper has concerned itself only with the data model layer of the language families, but there are higher levels as well, primarily the query and schema languages. To really be an effective vehicle for research and implementation Q has to be able to also support these.

As regards query languages, a specification of the tolog topic maps query language based on Q is being prepared [Garshol05], so it should be possible to use Q for query language specifications. As SPARQL (the RDF query language) is more or less a subset of tolog, it should be possible to map it to Q in much the same way. As this implies, suitable annotated models should be queryable with both query languages, regardless of whether they are "really" RDF or topic maps.

As for schema languages, it is certainly possible to specify TMCL (the standard topic maps constraint language) on top of Q. The same is true for RDF Schema and OWL, though TMCL and RDFS/OWL are so different that cross-mappings are only going to be partially meaningful. Applying RDFS and OWL to topic maps is not going to be easy, either, as the inferences in RDFS and OWL completely disregard scope, which does not exist in RDF. However, it should be possible to develop a theory of scope that enables RDFS and OWL to be used with topic maps. Sadly, this is, again, beyond the scope of this paper.

Clearly, much work remains to be done to fully exploit the Q model, but this paper has laid the foundation for that work.

Notes

1.

None of what is written here is wrong, strictly speaking. However, some concerns are overlooked, and there is a better way to approach this. I will produce an improved version of this paper at http://www.ontopia.net/topicmaps/materials/quads.html where these issues will be addressed.

2.

This paragraph is quite simply meaningless. I have no idea what was in my mind when I wrote it. I will produce an improved version of this paper at http://www.ontopia.net/topicmaps/materials/quads.html where this problem will be fixed.


Acknowledgments

The author is indebted to Robert Barta and Dmitry Bogachev for constructive criticism of the model presented herein; feedback from Robert Barta also improved the maths substantially. Geir Ove Grønmo reviewed an early draft of the paper and caught many errors, as did Steve Pepper. Parts of the paper are informed by discussions in the RDFTM working group, consisting of the author, Steve Pepper, Fabio Vitali, Valentina Presutti, and Nicola Gessa. Other parts were formulated in discussions with Geir Ove Grønmo.


Bibliography

[Barta05] The tau Model, Formalizing Topic Maps, R. Barta, G. Salzer, APCCM 2005, Newcastle, Australia, January/February 2005. Proceedings published by Australian Computer Society 2005, ISBN 1-920682-25-2.

[Bogachev04] TMAssert, D. Bogachev, ISO SC34 Reference Model workshop, Montréal, August 2004. http://homepage.mac.com/dmitryv/TopicMaps/TMRM/TMAssert.pdf

[Carroll05] Named Graphs, Provenance and Trust, J. Carroll, P. Hayes, C. Bizer, and P. Stickler. Proceedings of WWW 2005, May 10-14, 2005, Chiba, Japan. http://www.wiwiss.fu-berlin.de/suhl/bizer/pub/Carroll_etall-WWW2005.pdf

[Cregan05] An OWL DL construction for the ISO Topic Map Data Model, A. Cregan, Extreme Markup 2005, 1-5 August, 2005, Montréal, Canada. http://xml.coverpages.org/CreganTMs-OWL200505.pdf

[CWEB] The Cognitive Web, open source project. http://proto.cognitiveweb.org/projects/cweb/multiproject/cweb-tmgraph/

[Garshol02] An RDF Schema for topic maps, L. M. Garshol, 2002-10-07. http://psi.ontopia.net/rdf/

[Garshol03] Living with Topic Maps and RDF, L. M. Garshol, XML Europe 2003, London, UK, 5-8 May 2003. http://www.ontopia.net/topicmaps/materials/tmrdf.html

[Garshol05] tolog, L. M. Garshol, in preparation; hopefully to be accepted for TMRA '05, 6-7 October, 2005, Leipzig, Germany.

[http-range-14] http-range-14, WWW Technical Architecture Group issue. http://www.w3.org/2001/tag/issues.html#httpRange-14

[Khriyenko05] Context Description Framework for the Semantic Web, O. Khriyenko, V. Terziyan, Journal of the Brazilian Computer Society, "Ontologies: Issues and Applications", ISSN 0104-6500, March 2005. http://www.cs.jyu.fi/ai/papers/JBCS-2005.pdf

[Pepper03] Curing the Web's Identity Crisis, S. Pepper, S. Schwab, XML Europe 2003, London, UK, 5-8 May, 2003. http://www.ontopia.net/topicmaps/materials/identitycrisis.html

[RDFTM] A Survey of RDF/Topic Maps Interoperability Proposals, S. Pepper, F. Vitali, L. M. Garshol, N. Gessa, V. Presutti, 2005-03-29. World Wide Web Consortium Working Draft. http://www.w3.org/TR/2005/WD-rdftm-survey-20050329/

[TMRM] ISO 13250-5: Topic Maps — Reference Model, S. Newcomb, P. Durusau, ISO working draft. http://www.isotopicmaps.org/TMRM/TMRM-5.0/TMRM-5.0.pdf

[Tolle04] Understanding data by their context using RDF, K. Tolle, AISTA '04, Centre de Recherche Public Henri Tudor, Luxembourg, 15-18 November 2004. http://www.dbis.informatik.uni-frankfurt.de/~tolle/Publications/2004/AISTA04.pdf



Q: A model for topic maps

Lars Marius Garshol [Development Manager, Ontopia]
larsga@ontopia.net