RDF and Topic Maps: Something new for everyone

Michel Biezunski
mb@infoloom.com
Steven R. Newcomb
srn@coolheads.com

Abstract

What would be involved in a lossless transformation of a topic map into RDF form, and/or from RDF form into Topic Map form? This question forces us to examine the expressive power of each of the two paradigms. When converting RDF data into Topic Map form, each RDF statement becomes an association, and each RDF resource and string becomes a topic. Information that indicates a subject must be distinguished from information that constitutes a subject. Arc labels must become association roles. When converting information in the opposite direction, from Topic Map form to RDF form, a profusion of RDF triples, far in excess of the number of associations and topic characteristics in the topic map, may be required in order to capture the semantics comprehensively. The extra triples capture the distinction between subject indicators and subject constituters, the roles played in topic associations, the fact that a set of triples all emanated from a single associateion, the fact that a set of triples constitutes the expression of a scope, and the bidirectionality of arcs in topic maps. The exercise of defining lossless transformation between RDF and Topic Maps suggests that Topic Maps are usefully considered to be a way of summarizing certain useful kinds of constellations of RDF triples.

Keywords: Topic Maps; Validating; Transforming

Michel Biezunski

Since the earliest moments of the Topic Maps paradigm, Michel Biezunski has several times played the primary role in making topic maps useful, significant, popular, and understandable. He was the instigator of the ISO standardization process for topic maps, and he was a founding Chair of TopicMaps.Org, the host of the XTM (XML Topic Maps) Specification. He is still working to merge knowledge-based approaches with information management systems, both by designing custom applications and by fostering the development of new standards for the Web.

Steven R. Newcomb

Steven R. Newcomb is an independent consultant in information management. Co-editor of the ISO 10744 HyTime standard. Developer of the GroveMinder technology. Founding Chairman, Conventions for the Application of HyTime (CApH) activity of the Graphic Communications Association Research Institute (now IDEAlliance), the original developer of the Topic Map paradigm, and co-editor of ISO/IEC 13250:2000, the Topic Maps information architecture.

RDF and Topic Maps

Something new for everyone

Michel Biezunski [InfoLoom]
Steven R. Newcomb [Coolheads Consulting]

Extreme Markup Languages 2001® (Montréal, Québec)

Copyright © 2001 Michel Biezunski and Steven R. Newcomb. Reproduced with permission.

Introduction

The purpose of this paper is to share some recently-gained insights about topic occurrence characteristics, and about the significance and exploitability of occurrence-ness in association templates.

In order to understand these insights, some understanding of the built-in modeling features of the topic maps paradigm is required. We therefore begin this paper with a brief illustrated tutorial about association templates in Topic Maps.

Then we discuss an inherent feature of the topic map paradigm (the notion of occurrence-ness) that has been formalized in the "topic-occurrence" association template.

Finally, we note that the specialized syntaxes normally used for specifying topic occurrences actually interfere with the ability of topic map authors to fully exploit the Topic Map paradigm. If they use these specialized syntaxes, authors must choose whether to use their association templates to validate the occurrence-ness of their occurrences, or to perform other kinds of validation on them. (However, if they are willing to specify topic occurrences by means of <association> elements, authors can have full access to the entire power of the paradigm.)

Incidentally, we discuss the fact that, in the topic map paradigm, a built-in (but so far only implicit) validation constraint is available that is, in some sense, the opposite of the formal constraint imposed by "occurrence-ness".

Association templates in a nutshell

First, what is an association?

The term topic association (or, simply, association) ambiguously combines three different meanings:

  • An <association> element in a topic map. (Every <association> element connects two or more topics to one another, through itself. Every <association> element represents a specific relationship between the subjects of the connected topics.)
  • An a-node in a topic map graph. (Every a-node connects two or more topics to one another, through itself. Every a-node element represents a specific relationship between the subjects of the connected topics.)
  • The relationship represented by an <association> element and/or by an a-node.
When discussing topic maps, it's extremely useful (and, as a practical matter, it's vitally important) to have a single term that encompasses all three of these definitions. It is frequently necessary to invoke all three definitions simultaneously, and having a single term for all three makes it possible to be both precise and economical at the same time. The downside of using the term "association" is that it is often misused to mean only one of them, without saying which one is meant. Also, even when the term is used deliberately to mean all three, it is often misunderstood as meaning only one of them, which often leads to serious misunderstandings. For all purposes of this paper, therefore, we hereby give notice that we have made an effort to be consistent about our use of terms as follows:
  • When we are discussing an <association> element in a topic map, we always surround the term with angle brackets. Readers are hereby warned that when the word association appears without surrounding angle brackets, its definition is not limited to "<association> element," but also includes "a-node" and "relationship representable by either an <association> element or an a-node."
  • When we are discussing an a-node in a topic map graph, we always indicate this by using the term a-node.
  • When we are discussing the relationship representable by an <association> element and/or by an a-node, we use the term relationship.
  • When we mean all three definitions at the same time, we use the term association, without angle brackets. For example, this is the case whenever we use the compound term, association template. An association template applies constraints in all three contexts:
    • Association templates constrain the <association> elements that specify them as their templates.
    • Association templates constrain the complexes of arcs of which a given a-node serves as the nexus.
    • Association templates constrain the relationships that are representable by <association> elements and by a-nodes. Association templates are a special kind of association classes.
    A single association template, considered abstractly, can at the same time constrain any number of <association> elements, any number of a-nodes, and all of the relationships represented by both kinds of representations.

Two "view levels" for the same topic map

Although a-nodes and <association> elements have much in common, and although every <association> element "demands the existence of" an a-node when the element is processed, it is not true that all a-nodes are the result of processing <association> elements. For example, some a-nodes result from processing <occurrence> elements. This is because, in a topic map graph, all relationships between topics and their characteristics, including but not limited to the relationships between topics and their occurrences, are represented as a-nodes.

Thus, there are really two distinct perspectives, or "view levels" on the "true" nature of a given topic map, and they are both valid and useful. Serious misunderstandings have occurred when two practitioners have wrongly assumed that they are both having a discussion about a topic map in the context of the same view level of that topic map. Therefore, when discussing topic maps with others, it is often vital to establish which of the two contexts governs one's statements about "topics", for example. In this paper, we use the following conventions to clarify the context of our statements:

  1. To establish a view level context that focuses on the "characteristics" of topics, i.e., their names, their occurrences, and their roles in author-defined associations, and only on the topics and associations that authors have chosen to express explicitly by means of <topic> and <association> elements:
    • We say, "molecular level". This term is imprecise, at best, but it's what we're using until a better term appears. The reference to the notion of molecules is intended to evoke the idea that a complex of "atomic" constructs -- nodes and arcs in topic map graphs -- is implied by a single topic characteristic, such as a topic occurrence or a topic name, at the "molecular level".
    • We say, "topic characteristics level".
    • We say, "only the topics and associations explicitly declared as <topics>s and <association>s in the syntax".
    • We say something like, "the view of topic maps implied by the current interchange syntaxes", referring to the fact that topic names and topic occurrences both have special supporting syntactic structures.
      (By the way, the very existence of these special syntactic structures has misled many into thinking that the semantic structures that these distinct syntactic structures represent are also somehow distinct from the structure of all other connections between topics. Another common misconception caused by the existence of these special syntactic facilities is that, at a fundamental level, topic maps must represent or employ certain subjects (such as the notion of occurrence-ness in Topic Maps) in some way other than as topics. At topic map graph construction time, the specialized syntaxes for topic occurrences and topic names are processed in such a way as to become nodes and arcs, and they are distinguished from all other nodes and arcs only by their subjects, in the same way that all other nodes and arcs are distinguished from each other.)
  2. To establish a view level context that focuses on the nodes and arcs of a topic map graph:
    • We say, "atomic level".
    • We use the term, "topic map graph level".
    • We use the terms a-node, t-node and s-node, or we refer to the various arc types found only in topic map graphs.

Association classes

Associations can be typed. For example, we may wish to represent specific musical performances as associations between individual musicians and individual pieces of music. We may wish to constrain all representations of performances in such a way that there is one member that is always a specific musician, and another member this is always a specific piece of music.

Figure 1: Scope
[Link to open this graphic in a separate page]

Lena's performance of the song "Stormy Weather" is in the scope of the film, Stormy Weather, and in the soundtrack album.

1 shows an instance of our "musical performance" class of associations, diagrammed at the atomic perspective of a topic map graph.1 In all of these figures, the boxes representing a-nodes (each of which is itself the nexus of a complex of arcs that collectively represent a specific relationship) contain a large capital A. The boxes that contain a large capital T represent t-nodes, and the boxes that contain a large capital S represent s-nodes. T-nodes represent topics whose subjects are not specific relationships. A-nodes represent associations, and they can be regarded as topics whose subjects are specific relationships.

In 1, we see that the scope of the performance includes a film and a soundtrack. While the scope of the performance is not germane to our discussion of association templates, the fact that all associations in all topic maps always have scopes, and the fact that scope is such a fundamental concept in the Topic Maps paradigm, compels us to represent it in these diagrams for the sake of completeness.

Association without types (classes) or models (templates)

Figure 2: Association roles
[Link to open this graphic in a separate page]

In this representation of Lena's performance as an association, the two association roles are (1) the performer (Lena Horne), and (2) the music performed (the song, "Stormy Weather").

Associations must always specify the roles that their members play in the relationships that they represent. However, the Topic Maps paradigm does not require that an association has a type, model, or other constraints of any kind; all typing and modeling is optional. The "performance" association shown in 2 is not known to be a "performance" association (that would be its type or model, if it had one), but, as far as the Topic Maps paradigm is concerned, it still meets all the requirements of associations in general: it has at least one role, and it has at least one scope.

Typed associations that are unconstrained by models ("templates")

Figure 3: Class instance
[Link to open this graphic in a separate page]

Lena's performance is an instance of the class of all performances, but the class of all performances here imposes no formal constraints on its instances.

In the Topic Maps paradigm, it is possible for associations to be instances of association classes, even when those classes impose on them no formal constraints, such as a set of roles. 3 shows that Lena's performance is an instance of the class of all performances. In other words, the subject of the "performance" t-node is a class of thing that has, as one of its instances, Lena's performance. However, according to the diagram in 3, Topic Map paradigm software can't know any more than the fact that the topic map author has asserted a class-instance relationship between the class here identified as "performance", and the instance here identified as Lena's performance of "Stormy Weather". The "performance" t-node should probably have a subject indicator (not shown in the diagram) that somehow indicates what it means for a thing to be a performance. However, even if that subject indicator says that all performances must be associations that have a musician role and a music role, the Topic Map paradigm engine functionality that could ordinarily verify the conformance of instances to at least certain kinds of constraints will not be invoked, because the constraints, if any, have not been specified in a way that could invoke such software. Such software is invoked only when the associations to be validated specify association templates that, in turn, specify conformance constraints.

Associations constrained by models ("association templates")

In the Topic Maps paradigm, any association can optionally have exactly one template, or model, with or without being an instance of any number of other association classes. An association template is itself an association class, but it has the extra feature that it specifies a model to which all of its instances are expected to conform, and the model that it expresses can be used by Topic Map paradigm software to verify whether all associations that are declared to be instances of the template do, in fact, conform to the model.

Every association template specifies the member roles of all of the associations that are instances of it. The set of member roles that it specifies is comprehensive by definition: it is a complete set of the only roles that are relevant to the class of associations that the template represents.

Figure 4: Association template
[Link to open this graphic in a separate page]

Here, the "performance" association class is an association template. All "performance" associations have two roles: (1) "performer", and (2) "music".

4 shows that "performance" associations have a "performer" role and a "music" role. The model for each of the member roles is specified by a distinct a-node that represents the association between the topic whose subject is the template (playing the "template" role in the role-specifying a-node), and the topic whose subject is the role itself (playing the "role" role in the role-specifying a-node). Each role-specifying a-node is itself an instance of an association template (not shown) that is defined for all topic maps by the Topic Maps paradigm, called the "template-role-RPR" association template. (The RPR role of the "template-role-RPR" association template will be discussed later.)

As can be seen in 4, both the "Lena performs Stormy Weather" a-node and the "performance" template t-node use exactly the same two topics to specify roles. The constraint imposed by the "performance" template in 4 is that, in order that a given member of a given association instance be interpretable in terms of the "performance" template, it must play a role that is represented by the same t-node that plays the "role" role in one of the "template-role-RPR" associations that define the "performance" template. There are two such role topics shown in 4: "performer" and "music".

Recognized players of roles

As already noted, in an association template, each member role is characterized by a distinct "template-role-RPR" association. We have already discussed the "template" role and the "role" role of "template-role-RPR" associations; all "template-role-RPR" associations must have a "template" member and a "role" member. The "RPR" ("Recognized Player of Role") role is optionally played by a topic whose subject is the class of subjects of which the subjects of all topics (or associations) that play the templated role must be instances.

Figure 5: RPR constraint
[Link to open this graphic in a separate page]

The recognized player of role (RPR) constraint on the "performer" role is that anything that plays the "performer" role must be a "musician", and anything that plays the "music" role must be a "song".

5 illustrates that the recognized player of role (RPR) constraint on the "performer" role is that "performer"s must be "musician"s. Similarly, at least in the "performance" template, anything that plays the "music" role must be a "song". This means that, if the topic that plays the "performer" role plays the "instance" role in a "class-instance" association in which the "class" role is played by the "musician" topic, the RPR constraint for the role will be satisfied. This is shown in 6. However, the RPR constraint would also be satisfied if "Lena" were an instance of any topic that is a subclass of the "musician" topic, such as "singer" (not shown).

Figure 6: RPR constraint satisfied
[Link to open this graphic in a separate page]

Since Lena is a musician and "Stormy Weather" is a song, the "performance" template's RPR constraints are satisfied by the "Lena performs Stormy Weather" association represented in 1, 2, 3, 4, 5, and 6.

(More information about topic map graphs, t-nodes, association templates, etc. can be found at http://www.topicmaps.net/pmtm4.htm.)

The "topic-occurrence" association template

Introduction

The "topic-occurrence" association template is not ordinarily used as the template of any <association> element, because occurrences are not ordinarily expressed by means of <association> elements (although, of course, they can be). Instead, in the XTM DTD, for each occurrence of a topic, a special <occurrence> subelement type of the <topic> element type is used to express the association between the containing topic and its occurrences.

The special syntactic features provided for occurrences in most interchange syntaxes for Topic Maps actually obscure the fact that, in a topic map graph, there is no specialized structure that exists especially to represent the connections between topic occurrences and their corresponding topics. Instead, there is a published association template whose subject is the class of relationship that exists between topics and their occurrences.

Regardless of whether an occurrence is expressed as an <occurrence> element or as an <association> whose template is the published "topic-occurrence" association template, the result in the topic map graph is the same: there is an a-node that represents the association between the topic and its occurrence. This a-node's association template requires that every "topic-occurrence" a-node has two roles, "topic" and "occurrence".

The "recognized player of role" constraint for the "topic" role is that whatever plays the "topic" role must be a topic. However, there is no meaningful constraint here, since anything that can be a member of any association must, by definition, be a topic. In other words, the constraint that only a topic can be permitted to play the "topic" role effectively places no constraints whatsoever on the player of the "topic" role. In still other words, all topics are eligible to have occurrences. (Associations, too, can have occurrences, because they, too, are regardable as topics, at least to the extent that they can have subject indicators and they can play roles in associations.)

What is occurrence-ness?

In the Topic Maps paradigm, the meaning of the term "occurrence" is "relevant information" -- information relevant to the subject of the topic of which the information is said to be an "occurrence". The subject of any topic that plays the "occurrence" role in a "topic-occurrence" association must always be an information resource. In other words, the occurrence topic must have an "addressable subject", rather than a subject that is not an information resource. Syntactically, in the XTM DTD, this means that the occurrence itself must be addressed by means of a <resourceRef> rather than by a <subjectIndicatorRef>. This particular constraint on "topic-occurrence" associations is built into the topic maps paradigm itself. (As we shall discuss shortly, the published subject represented by the "topic-occurrence" association template allows topic map authors to combine this same constraint with other, more specialized constraints in user-defined association templates.) This constraint is the only formal (i.e., automatically verifiable) constraint imposed by the notion of occurrence-ness in Topic Maps. In addition to this formal constraint, there is also an unverifiable semantic intent: it's expected that whatever information resource is referenced as an occurrence of a topic is regarded by the author of the topic map as somehow relevant to the subject of that topic. Again, occurrence-ness is (1) information-ness, and (2) relevance to some specific topic, and nothing else.

One of the most powerful ideas that has come out of the quest to federate information expressed in RDF terms with information expressed in terms of Topic Maps is the recognition of the mutual exclusivity of two ways of regarding a single information resource. One way is to regard the information resource as "constituting" the subject of some topic, while the other way is to regard the resource as somehow (compellingly!) "indicating" the subject of some topic. The notion that an information resource can itself constitute a subject is intimately related to the notion of occurrence-ness. In fact, formally speaking, occurrence-ness amounts to a constraint on "topic-occurrence" associations that whatever resource is the subject identity point for the topic that plays the "occurrence" role must be regarded not in terms of what it means, but in terms of itself as an information resource.

If occurrence-ness is really the same thing as "subject-constitution", we are naturally led to ask, "Why isn't the opposite constraint also available for use in association templates?" In other words, what association template offers us the ability to constrain role player subjects not to be information resources considered as subject constituters? Currently, at this writing, there is no such published association template. We propose that the Topic Map standard should offer means, such as a published association template, that can be used to establish the RPR constraint that the subject of the topic playing a given role cannot itself be an information resource.

Occurrence types

When we want to say that some information somewhere is relevant to some subject, we often also want to say more precisely what is the nature of the relevance. After all, two different pieces of information that are both relevant to a topic may tend to influence the user's understanding of the topic in wildly different ways, and the occurrences themselves can be extremely diverse. How can topic map authors (and the users of the topic maps that they author) distinguish between different kinds of occurrences?

Both the XTM DTD and the ISO 13250 DTD provide ways to specialize occurrences. In the XTM DTD, <occurrence> elements contain <instanceOf> elements, each of which refers to a topic whose subject is a class of occurrence of which the information resource referenced by the <resourceRef> contained in the <occurrence> is an instance. In the ISO 13250 DTD, the <type> element in the content of the <occurs> element has the same function. But what does this mean at the atomic (topic map graph) level? What is the impact of a <type> element in the content of an <occurs> element, or of an <instanceOf> element in the content of an <occurrence>, on the topic map graph?

We'll use an example in order to make the foregoing question clearer. Let's assume that the subject of the topic that has the occurrence is the class of airplanes known as Boeing 737s, and that the occurrence itself is a photograph of a particular Boeing 737 airplane. We want to characterize such occurrences as photographs, in addition to characterizing them as information resources (occurrences). If we're using the XTM DTD as the interchange syntax for our topic map, in our <topicMap> element we have a <topic> element whose subject is the class of airplanes known as Boeing 737s. This <topic> element contains an <instanceOf> that references another <topic>, and the referenced <topic>'s subject is the class of all information objects that represent photographs. What effect does the existence of this <instanceOf> element have on the graph?

All associations, including "topic-occurrence" associations, can themselves play roles in (usually other) associations. In other words, at least in terms of its ability to be a member of one or more associations, any association can itself be treated as a topic whose subject is the relationship that it represents. The effect of the <instanceOf> element on the topic map graph has been proposed to be that, at the graph level it causes ("demands") the existence of an additional a-node that represents a "class-instance" association in which the topic referenced by the <instanceOf> element plays the "class" role, and the a-node representing the "topic-occurrence" relationship plays the "instance" role. (The scope of the "class-instance" relationship is proposed to be unconstrained.)

While the proposed impact of <instanceOf> elements contained in <occurrence> elements makes sense and works well, it places enormous emphasis on a particular class of relationship, the "topic-occurrence" relationship. This particular relationship class is heavily emphasized for two reasons: history and accessibility.

History: The notion of "occurrences" is historically essential to the development of topic maps, rooted as they are in thinking about traditional back-of-book indexes. Indeed, back-of-book indexes are little more than sets of occurrences. They contain pointers, usually expressed as page numbers, to information resources that are relevant to particular topics. The historical importance of occurrence-ness is reflected in virtually all of the syntaxes for topic maps that have ever been proposed or adopted.

Accessibility: The notion of "occurrences" has proven to be easy for many people to grasp. It would be difficult to overstate the importance of the emphasis that Topic Maps have placed on the notion of "occurrence" in garnering popular support for the Topic Maps paradigm.

And yet, as we've already discussed, occurrence-ness only amounts to two constraints on things that are asserted to be occurrences:

  1. a formal constraint that the referenced thing must be an information resource, and be considered as an information resource, and
  2. an informal constraint that the content of the information resource must somehow be intended by the topic map's authors to be regarded as relevant to the topic of which they consider the resource to be an occurrence.

Now that we have a broader perspective on topic maps, and we now realize that occurrences are really a specialized class of associations between topics, we would like to have the flexibility of subclassing this class of associations, so that we can incorporate the constraints of occurrence-ness into many different classes of association, by making each of them a subclass of the "topic-occurrence" class. This approach allows occurrence-ness to be exploited in the same, flexible way as all other constraints on role players. We can avoid having to use a single association template, the "topic-occurrence" template, for all occurrence relationships. If we have to use the "topic-occurrence" template for all relationships in which we need to constrain the role players to addressable subjects, we will always have to choose between using the template validation feature for occurrence-ness (addressable-subject-ness), or using the template validation feature for other constraints on recognized players of roles. We will not be able to do both at the same time.

How can an association template be specified to be a subclass of another association template? In terms of our example, the same question can be restated:

How can we specify an association template -- a template for "topic-photograph" associations -- that will impose all of the following constraints:

  1. The subject of the topic that is the occurrence (the topic that plays the "photograph" role) must be an information resource. This is the formal aspect of occurrence-ness.
  2. The subject of the topic that is the occurrence (the topic that plays the "photograph" role) must be relevant to the subject of the topic of which it is an occurrence. This is the informal aspect of occurrence-ness -- the author's use of the notion of "occurrence-ness" indicates that the author of the topic map felt that the occurrence is relevant.
  3. The subject of the topic that is the occurrence (the topic that plays the "photograph" role) must be a photograph. This is the additional, user-defined constraint that we wish to combine with the constraints imposed by occurrence-ness.

First of all, we must indicate that "topic-photograph" template is a subclass of the "topic-occurrence" template. In the interchange syntax and in the graph, this is a simple matter of specifying a "superclass-subclass" association between the two templates, in which the "superclass" role is played by the published "topic-occurrence" template, and the "subclass" role is played by the user-defined "topic-photograph" template.

While it is necessary to establish the "superclass-subclass" relationship between the two templates, it is not, by itself, sufficient. The problem is that, even though we have specified that one template is a subclass of another, we have not yet specified which role in the superclass template corresponds to which role in the subclass template. It is vital to know that the "photograph" role in the subclass is the role that corresponds to the "occurrence" role in the superclass, and that the "topic" role in the subclass corresponds to the "topic" role in the superclass.

In topic map templates, each association role is represented by a "template-role-RPR" a-node. The correspondences between the roles can be specified by asserting "superclassTemplateRole-subclassTemplateRole" associations between them. In our example, the "template-role-RPR" a-node that represents the "occurrence" role in the "topic-occurrence" template can play the "superclassTemplateRole" role in an a-node whose "subclassTemplateRole" role is played by the "template-role-RPR" a-node that represents the "photograph" role in the "topic-photograph" association template.

Notes

1.

In 1, 2, 3, 4, 5, and 6, "Lena" refers to the incomparable Lena Horne, whose startling rendition of "Stormy Weather" in the early film Stormy Weather launched her very long and distinguished career.


RDF and Topic Maps

Michel Biezunski [InfoLoom]
mb@infoloom.com
Steven R. Newcomb [Coolheads Consulting]
srn@coolheads.com