Metadata on the Web: On the integration of RDF and Topic Maps

Paolo Ciancarini
cianca@cs.unibo.it
Riccardo Gentilucci
gentiluc@cs.unibo.it
Marco Pirruccio
pirrucci@cs.unibo.it
Valentina Presutti
presutti@cs.unibo.it
Fabio Vitali
fabio@cs.unibo.it

Abstract

Meta-information provides an additional layer of abstraction on web documents that can be used for sophisticated applications relying on the precise semantic characterization of their content. Two leading standards, RDF and Topic Maps, compete as the model through which expressing metadata. These two models are sufficiently different as to make back and forth conversion a difficult and imprecise task. In this paper, we introduce META, a set of integrated tools helping in editing, navigating and converting metadata expressed in either language.

Keywords: Topic Maps; RDF; Metadata; Editing/Authoring

Paolo Ciancarini

Riccardo Gentilucci

Riccardo Gentilucci holds a Laurea degree in Computer Science from the University of Bologna

Marco Pirruccio

Marco Pirruccio holds a Laurea degree in Computer Science from the University of Bologna

Valentina Presutti

Valentina Presutti holds a Laurea degree in Computer Science from the University of Bologna and has been a Ph.D. student since January 2003.

Fabio Vitali

Fabio Vitali is a professor at the Department of Computer Science at the University of Bologna. He holds a Laurea degree in Mathematics and a Ph.D. in Computer and Law, both from the University of Bologna. His research interests include markup languages; distributed, coordinated systems; and the World Wide Web. He is author of several papers on hypertex functionalities, the World Wide Web, and XML.

Metadata on the Web

On the integration of RDF and Topic Maps

Paolo Ciancarini [University of Bologna, Department of Computer Science]
Riccardo Gentilucci [University of Bologna, Department of Computer Science]
Marco Pirruccio [University of Bologna, Department of Computer Science]
Valentina Presutti [University of Bologna, Department of Computer Science]
Fabio Vitali [University of Bologna, Department of Computer Science]

Extreme Markup Languages 2003® (Montréal, Québec)

Copyright © 2003 Paolo Ciancarini, Riccardo Gentilucci, Marco Pirruccio, Valentina Presutti, & Fabio Vitali. Reproduced with permission.

Introduction

The Semantic Web [LHL01] is an on-going large-scale effort to improve the current architecture of the World Wide Web by adding a semantic infrastructure to web resources that can be used for sophisticated data-oriented applications. As its basis, we identify metadata, or information about information, that unambiguously specify machine-understandable facts about web resources.

Through metadata we can select something (most often a web-accessible resource, such as a document or process) and associate it with some information, e.g., a description, classification data, or any other attribute. With metadata, we can also establish explicit relationships between the resources we describe. This allows us to explicitly define the nature of attributes and relationships in terms of the metadata language itself, so that it is possible to add semantic meaning to the things being described. Furthermore, since the semantic layer that characterizes the Semantic Web is built apart from the information resources that are linked by metadata, the addition of semantic contents can be performed without any editing of the original resources.

Even though it is possible to create metadata documents semi-automatically, a more precise approach requires human intervention. Much of the information that can be usefully specified for a resource simply cannot be extracted without some kind of human interpretation. Also, the metadata that is to be recorded about a resource is often derived from a vocabulary of interesting categories that are relevant for subsequent processes. These vocabularies, called ontologies, can be required to adhere to standards that can only be applied by humans. Once this necessarily intelligent work has been performed, one of the biggest claims that the Semantic Web makes is that by formalizing the expression of the semantics of the information, every automatic application will be able to manage, understand, and reason on it.

There are two competing models by which we can express metadata: RDF [Resource Description Framework] [RDF] is a W3C recommendation and by design is meant to form the base of the W3C’s vision of the Semantic Web. Topic Maps [TM] is the ISO 13250 standard, and although developed independently of the W3C, it has several properties that make it an interesting alternative to RDF.

According to [Gar02], the perspective we are facing is the coexistence of the two standards. In fact, given the many years of efforts that both the W3C and the ISO groups have independently spent to develop them, and because of technical arguments (some of which will be discussed later in this paper), it is probably too late now to work towards a merging of RDF and Topic Maps. Furthermore, since they have been designed for different goals (even if they have some basic concepts in common), they have already been separately accepted by different user communities.

This means that the users community will be forced to choose between two different technologies when approaching the definition of their own domains. In this vision of the world, it is very important to study all possible ways to make this coexistence easier.

An important characteristic of both models is the fact that they allow a serialization in XML, which makes metadata specifications web resources themselves, amenable to further commenting and descriptions by additional metadata layers, and so on. The model is, therefore, repeatable and stackable, so that a complex web of descriptions, descriptions of descriptions, and so on can be fruitfully generated.

The two languages are rather different even in their basic concepts, and the choice of one model can have far-reaching consequences both on the kind of statements that can be expressed on a resource, and, more importantly, on the long-term usefulness of these statements. Very few tools exist so far that provide conversion from and to each model, and most of them suffer of serious drawbacks. Furthermore, the abstractness and complexity of the argument has not helped towards the design of easy-to-use tools for the generation and examination of metadata documents. For example, most of the editors for metadata collections is either limited in scope (to a single vocabulary), or do nothing to hide the intrinsic complexity of the syntax.

To partially overcome these difficulties, we set about developing META, three integrated tools for the coherent management of metadata, both for RDF and Topic Maps. META is composed of a metadata editor, a metadata navigator, and a bidirectional converter from RDF to Topic Maps and vice versa.

In this paper, we present our approach to the bidirectional conversion of RDF and Topic maps, and show how the use of schemas, and the adoption of PSIs [Published Subject Identifiers] in Topic Maps and standard predicates in RDF, can lead to a painless integration of the two languages. This integration is also instrumental in the creation of a single editing tool and a single navigation tool that can be used for metadata collections expressed in both languages.

In Section “Metadata standards”, we summarize the characteristics of both RDF and Topic Maps, and summarize a few related works in the field of conversion and the editing of metadata collections. In Section “META: a tool for metadata conversion, editing, and navigation”, we introduce our approach to the conversion of RDF into Topic Maps and vice versa. The following subsections are dedicated to the description of the tools. We conclude with Section “Conclusion” describing a possible evolution path for our work.

Metadata standards

RDF and RDFS: an introduction

RDF is a W3C recommendation for the expression of metadata on any kind of target, from real life objects to abstract entities, but it is particularly useful for Web resources such as documents or server-side processes. The fundamental model of RDF is composed of three concepts: Resources, Properties, and Statements. A resource is what is being described through metadata, and it is identified by a URI. A property is an attribute we want to associate to the resource. The statement is a triple composed of the resource we want to describe, the property, and the property’s value. Property values can either be literals (i.e., strings) or other resources, which can possibly introduce further and more abstract levels of indirection.

For instance, the sentence “Mario is the author of http://www.mario.org” is a statement in which “http://www.mario.org” is the resource that we are describing, “being the author of” is the property we are associating to it and “Mario” is the literal value of this property. We can graphically express this statement as shown in Figure 1.

Figure 1: A simple RDF model
[Link to open this graphic in a separate page]

This graph can be serialized in XML. Two equivalent syntaxes are presented in [RDFXML], an abbreviated and a full syntax, although in this paper we will not make this distinction. The graph is serialized as follows.

<rdf:Description rdf:about="http://www.mario.org">
   <s:author>Mario</s:author>
</rdf:Description>

Other important syntax elements in RDF are containers, i.e., structures containing collections of resources. Statements can be made to refer to collections, or collections can be used as values for properties. The RDF model defines three types of containers: bags, i.e., sets with repeatable elements; sequences, i.e., sets with a specified order on their elements; and alternatives, i.e., sets among which to extract one value as the relevant one.

RDFS [RDF Schema] is a W3C working draft [RDF] aimed at defining a description language for vocabularies in RDF. An RDF Schema defines classes of resources and types of relationship that can be used in RDF statements. RDF Schema makes it possible, for example, to validate a value of a property or to constraint its range and domain of applicability.

Using RDF Schema we can add other properties to the metadata defined in the previous example, for instance, defining the web page as a class, adding some constraints on it (the values for the author relation need to be literals, while the domain of values for it are web pages), and applying this class as type for the metadata about “http://www.mario.org”.

<rdf:Description rdf:ID="webpage">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
</rdf:Description>

<rdf:Property rdf:ID="author">
<rdfs:domain rdf:resource="#webpage"/>
<rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
</rdf:Property>

<rdf:Description rdf:about="http://www.mario.org">
<rdf:type rdf:resource="#webpage"/>
<s:author>Mario</s:author>
</rdf:Description>

    

Topic Maps: an introduction

Topic maps, defined by the ISO 13250 standard, is a model for describing knowledge structures and associating them with any kind of information resources. The most important concepts of Topic Maps are Topics, Occurrences, and Associations.

Topics represent any kind of thing we are interested in describing: in order to associate metadata to some entity (a web page, a book, a person, etc.), we create a topic and we specify, as its subject, a URN suitable to identify this entity. In order to associate attributes to our subjects, we use occurrence structures that allow us to specify name-value pairs related to the topics. Through associations, we can define relationships among different topics. Associations are intrinsically bidirectional; in order to distinguish the role of each member in the association, we need to create topics whose sole purpose is to represent a so-called association role type.

In fact, the real way through which we add semantics in Topic Maps is the type system. For example, we can make any of the previous concepts (topics, occurrences, and associations) instances of classes, which really represent nothing but are other topics purposely created for typing.

An important concept of Topic Maps is that of subject. A subject is anything we want to describe. In order to identify a subject, we can use:

  • a subject indicator: an information resource
  • a subject identifier: a locator that refers to a subject indicator

Topic maps contains only subject identifiers, and each topic represents a subject. In order to support topic map interchange and mergeability, subjects can be given a precise semantics, established by an organization promoting that specific standard. This is done by using a mechanism called “the publishing of subject identifiers”, that is, an organization promoting a specific standard defines PSIs and advertises them. The PSIs refer to subjects giving them well-known semantics suitable to type topics with class roles. A PSI must be a URI.

Topic maps does not have any fixed XML serialization language specified. One of them, XTM [XML for Topic Maps] [XTM], though, seems to be the best candidate to become it. The sentence “Mario is the author of http://www.mario.org” can be written in XTM as follows:

<topic id="tt-person">
   <baseName>
      <baseNameString>Person<baseNameString>
   <baseName>
</topic>

<topic id="tt-webpage">
   <baseName>
     <baseNameString>Web Page</baseNameString>
  </baseName>
</topic>

<topic id="mario">
   <instanceOf>
      <topicRef xlink:href="#tt-person"/>
   </instanceOf>
  <baseName>
     <baseNameString>Mario</baseNameString>
   </baseName>
</topic>

<topic id="page">
  <instanceOf>
      <topicRef xlink:href="#tt-webpage">
  </instanceOf>
 <subjectIdentity>
      <subjectIndicatorRef xlink:href="http://www.mario.org"/>
   </subjectIdentity>
</topic>

<topic id="at-author">
   <baseName>
      <baseNameString>Author</baseNameString>
   </baseName>
</topic>

<association id="assoc001">
   <instanceOf>
      <topicRef xlink:href="#at-author">
   </instanceOf>
   <member>
     <roleSpec>
       <topicRef xlink:href="#tt-webpage"/>
      </roleSpec>
     <topicRef xlink:href="#page"/>
   </member>
   <member>
      <roleSpec>
       <topicRef xlink:href="#tt-person"/>
      </roleSpec>
     <topicRef xlink:href="#mario"/>
   </member>
</association>

We have defined two topic types tt-person and tt-webpage, used to add type to the topics “mario” and “page”; by this, we state that the topic corresponding to the string “Mario” is describing a person and the topic whose subject is “http://www.mario.org” represents a web page. Then we define the association type at-author, used to add a type to the association between “mario” and “page”. Every member of this association specifies the role it assumes. In this case, the role and the type of the topic are the same.

For the purpose of clarifying the concept of PSI, below is an example of an XTM document containing PSIs for the subject “apple” [Pep02]:

<?xml version="1.0" encoding="iso-8859-1"?>
<topicMap
    id="fruits-psiset-tm"
    xmlns="http://www.topicmaps.org/xtm/1.0/"
    xmlns:xlink="http://www.w3.org/1999/xlink">
<topic id="apple">
  <subjectIdentity>
    <subjectIndicatorRef
      xlink:href="http://psi.fruits.org/fruits.psi#apple"/>
  </subjectIdentity>
  <baseName>
    <scope>
      <subjectIndicatorRef xlink:href="http://www.topicmaps.org/xtm/1.0/
      language.xtm#en"/>
    </scope>
    <baseNameString>apple</baseNameString>
  </baseName>
  <occurrence>
    <instanceOf>
      <!-- occurrence typed by a PubSubj TC published subject -->
      <subjectIndicatorRef
      xlink:href="http://psi.topicmaps.org/pubsubj/pubsubj.psi#description"/>
    </instanceOf>
    <scope>
      <!-- scope by language English -->
      <subjectIndicatorRef
      xlink:href="http://www.topicmaps.org/xtm/1.0/language.xtm#en"/>
    </scope>
    <resourceData>Some suitable indication of the concept of apple 
    (the fruit), perhaps with an illustration.
    </resourceData>
  </occurrence>
</topic>

...

</topicMap>

Converting between RDF and Topic Maps

Both models are suitable to solve the knowledge management problem, but the idea that inspired them was different. RDF has been developed with the Semantic Web in mind, while Topic Maps was born as a means to create a practical way to build indexes of information resources. Nonetheless, both standards try to achieve a practical and compact way to describe and relate generic entities.

In particular, both RDF and Topic Maps:

  • Allow the definition of abstract and concrete entities
  • Have a type system to build hierarchies
  • Allow the definition of semantic relationships between entities (more precisely, they allow defining of typed relationships)
  • Allow a labeled-arc graph representation of relationships and classes

The most significant difference between the two is the expressive power of the respective constructs; while Topic Maps offers a more sophisticated set of basic constructs to use in the definition of metadata, RDF is easily extensible. The issue is, therefore, between using the predefined set of predicates, as opposed to having each user redefine new ones for the same purposes.

The problem of finding a mechanism that allows us to go back and forth between the two languages is, therefore, an important one, and needs to be studied carefully. In Subsection “META: the tool for metadata conversion” we provide a short introduction to our approach, while here we list a few existing attempts in the literature.

[Moo01] was the first attempt to offer integration between the two languages. This did not result in an implemented system, but rather a general roadmap for all possible conversion mechanisms yet to implement. The strategy followed was very straightforward, since it was based on the definition of PSIs to represent RDF concepts for the RDF to Topic Map conversion, and the definition of RDF predicates to represent Topic Maps constructs in the other direction. The definition of the mapping was not completely defined, as it left unanswered issues (e.g., on the translation of variant elements and resourceRef and subjectIndicatorRef) of some importance for a practical use of the approach. The lack of reversibility of the conversion might also be considered a limitation.

In [LacDec01], the authors defined a mapping model from Topic Maps to RDF based on the “Topicmaps.net’s Processing Model for XTM 1.0” presented in [NewBie01], which defined a set of rules for processing Topic Map documents in order to reconstitute the meaning of the information they were intended to convey to their recipients. Basically [LacDec01]defined an RDF Schema that represents the Topic Map model with RDF statements. The approach, as claimed by the authors, is complete and reversible. However, the paper did not mention anything about the other direction of conversion.

Also based on [NewBie01] is [Ogi01] which introduced some of the ideas we have exploited in our work, such as the possible representation of Topic Map association, scoped names, and occurrences with resources, instead of properties as they are usually conceived. Nonetheless, this work also fails to take into account the other direction of the mapping.

The common drawback of all the works listed here is, in our opinion, the rather awkward appearance of the documents coming out of the conversion. Even if it is sometimes necessary to expect the applications managing these documents to have some prior knowledge of the conversion schema to preserve information correctly, we find the readability of the documents produced as an important aspect of the conversion process.

The most evolved work on integration context has been presented by [Gar01], which takes integration context steps beyond the previously discussed papers and proposes a solution similar to ours.

In particular, for RDF to Topic Maps conversions, [Gar01] draws the conclusion that a generalized mapping for any RDF model is not possible because of the many suitable alternatives that could be chosen, and then it follows a complete mapping strategy guided by a file which contains the rules for the translation of every single RDF property. Each entry in the mapping file corresponds to an RDF property of the source document and defines the mapping into Topic Map constructs. This is similar to our approach, but it does not take into account the similarities of standard RDF and RDF predicates with Topic Map constructs, forcing the mapping to be defined even when it is clearly implied by the common meaning of the constructs of both languages.

For Topic Maps to RDF conversions, on the other hand, [RDF] suggests two different approaches. The first one is a straightforward modeling of each Topic Map construct into RDF classes and predicates, already proposed by [LacDec01] and [Ogi01]. The main drawback of this approach, as mentioned, is that the result is very different from a native RDF document.

The second approach again uses a mapping file to guide the translation on a case-by-case basis. To accomplish this, because of the lack of a standard TMQL [Topic Maps Query Language] , yet to be defined, a proprietary query language is used to extract information and produce RDF statements in accordance with the rules to be applied when a match occurs. The results achieved are very good, but again the translation is completely rule-driven, and it does not make use of implicit model-based commonalities.

META: a tool for metadata conversion, editing, and navigation

In this section, we present META, a collection of three tools that we have developed in order to contribute to the metadata management problem. META allows the creation and navigation of documents containing meta-data information in an environment where RDF and Topic Maps need to coexist.

The main goal we pursue is to provide a high-level view of metadata technologies, by which even users without specific technical knowledge of RDF and Topic Maps will be able to create, edit, and browse these kind of documents.

In order to obtain this, META hides all of the syntax details of these languages. Users are faced with simple GUIs, allowing them to leverage the power of the underlying technologies. Due to the rather low level of maturity reached by these standards, we have been forced to extend them in some aspects that we will discuss in the next paragraphs.

  • The converter: the presence of two different standards makes integration a primary concern. We have built a conversion tool in order to allow the “universe” of knowledge-based management systems not to be partitioned by the presence of two competing standards. The translation provided by the converter follows generic rules (described in Subsection “META: the tool for metadata conversion”) and, in some cases, may not be the desired one. Accordingly, in order to obtain a more customized translation, users who know RDF and Topic Maps can pilot the conversion sequence.
  • The editor: editors currently available for XTM and RDF are essentially XML editors based on the grammar of the languages. Even if tools like these help in the creation of metadata documents, the users must be expert in the original languages and must understand the concepts behind each of their constructs. We have developed an editor that takes advantage of schema information for the particular context; through the use of the schema, we can reach a higher degree of transparency because users are not forced directly to obey syntax constraints.
  • The navigator: the main target in developing this tool has been to provide a uniform view of the document written either in RDF and Topic Maps. Moreover, as for the editor, we have hidden the internal structure of these documents, so users are not forced to know either the syntax of the specification languages or the meaning of each XML tag.

META: the tool for metadata conversion

For interoperability between applications using different standards, it would be useful to have a conversion model between XTM and RDF/XML syntax. The basic idea behind our work has not been to develop a new model that could be used to unify the two, but rather to find a way to translate the concepts expressed in one format into the other. Unfortunately, because of the different nature and history of RDF and Topic Maps, that has not been always easy or even possible.

Basically, there are two different approaches to tackle this problem. The first is to describe one model in terms of the other. This approach is problematic in that the converted document is necessarily very different from the one that would have been written directly in the destination language, and hardly readable. The second approach tries to identify, wherever possible, the semantic equivalence of the respective constructs and define a model-to-model mapping. The problem with this approach is that this is not always possible, and often requires some case-by-case approaches that may have no general usefulness.

We consider that the best approach is a hybrid of the two. For example, if we have to translate the scope element from a Topic Map, we could define an RDF predicate to which associate the same meaning, and then apply it to the corresponding entity.

We can summarize the fundamental issues of a conversion schema as:

  • Semantic completeness: how much of the information is preserved in the transformed document?
  • Readability: how close is the document produced by the translation process to what it would have been if written using the native language?
  • Reversibility: how different from the original is a document that is converted twice?

Topics, resources, URIs, and names

In a natural model-to-model mapping, topics can be seen as RDF resources, and vice versa. However the mapping is not trivial, as the correct identification of the entity can vary in a case-by-case basis. Topic Maps uses subjectIdentitys to annotate the subject of a description, which can be either an addressable resource (indicated in XTM by a specific tag), or a subject indicator (a URI referring to a resource that indicates to a human what the subject of the topic is). On the other hand, RDF uses a unique URL to refer to the subject of a description, indicated by the attribute rdf:about, or, for abstract resources, by rdf:ID.

So, to pass from Topic Maps to RDF, the straightforward solution is to use subject resources as the URL for statement subjects, and to create a random ID for those topics that do not have a resource as subject, and eventually to specify the subjectIndicator with a new predicate to which is assigned the same meaning.

</rdf:Description>

<rdf:Description rdf:ID="page_assoc001">
 <rdf:type rdf:resource="#tt-webpage"/>
 <rdfs:isDefinedBy rdf:resource="#page"/>
</rdf:Description>

<rdf:Description rdf:ID="mario_assoc001">
  <rdf:type rdf:resource="#tt-person"/>
 <rdfs:isDefinedBy rdf:resource="#mario"/>
</rdf:Description>

The drawbacks here are low readability and the use of a particular RDF Schema instance, which forces all applications to know it for a correct understanding of the converted statements. On the other hand, this conversion is always applicable, so it represents the default behavior of our tool.

The other side of the mapping, from RDF to Topic Maps, is also difficult. The problem here lies in the correct identification of the nature of the predicate. Unless RDF Schema information, specifying the range and domain of each predicate, is available for the original document, it is not always possible to understand effectively if a statement relates two entities, or if the object of the statement can be considered an attribute of the subject. Anyway, assuming that this is not the case, the predicate-association type mapping is immediate. The role member can be deduced by the constraint information relative to the predicate, disambiguating the produced association.

It is worth noting that, in the absence of schema information about the properties, their nature has to be deduced by the context. This follows from the lack of a clear separation between relations and attribute in the RDF standard.

The solution we propose uses a default behavior, which can be tailored to treat a predicate as an association and vice versa. In this manner, the user can pilot the translation to a finer granularity, as shown in [GenPir01].

Topic Maps has the notion of occurrences, which are used to associate name-value pairs to topics. The value of the occurrence may be a resource identified by its URL. RDF, as discussed before, uses predicates for this purpose, too. Here again the main problem lies in the recognition of the nature of the attribute of a property. RDF does not have a construct to reify resources, so it is sometime hard to tell the differences between a relation involving two resources and an attribute, unless the value of a property is a literal value.

The RDF and RDFS specifications give us a useful set of predefined predicates, and these can be mapped to occurrences or relations as appropriate, but for user-defined predicates, we have to rely on schema information. Particularly, eventual information about range and domain can be exploited. That is, if we are told that the range of a user-defined predicate is a particular rdfs:Class, we can assume the predicate is intended to establish relations between entities, whereas otherwise it is possible to assume the attribute nature of the property. The behavior of our engine (when converting RDF to Topic Maps) follows this rule, mapping predicates to occurrence types if nothing is known about the range of the predicate itself, and resolving it otherwise to an association type as explained in the previous paragraph.

The other side of mapping does not present these particular problems, resolving occurrence types in predicates.

Type systems

Both standards have a fairly complete type system that allows the creation of class-instance and superclass-subclass relations with entities. RDF/RDFS does this with the predefined rdf:type predicate, relating an instance with its class, and rdfs:subClassOf which relates a subclass with its superclass. Topic Maps allows the typing of topics with other topics with class functionality. Unfortunately, there is no standard way to declare the class (or topic type, as it is usually said) nature of topics. However, this can be done using a PSI. On the other hand, the superclass-subclass relation is easily rendered with an explicit association. With the mapping we have considered so far, these translations do not seem to present particular problems.

Notes on the converter: readability and reversibility

Today META is still being testing, and our work is towards its enhancement and extension. We have already tested the converter using some Topic Maps and RDF documents that we have selected based on the presence of some significant aspects in their structure. The results obtained let us hope that the requirement of readability is well-satisfied. In particular, we needed only to change some identifiers that after the translation appear as numerical strings. As to concerns relating to the requirement of reversibility, the test results are also acceptable.

META: the editor

Editors available nowadays for XTM or RDF are essentially XML editors based on the grammar of one of the two languages. Even if tools like these make the creation of metadata documents easy, users must understand the concepts behind each of these paradigms.

In our opinion, by taking advantage of schema information of a particular context, it is possible to reach a higher degree of transparency, since users are not forced directly to obey syntax constraints. The idea is to allow the creation of metadata resources that can be typed using the type systems defined in a schema. So, once the type is chosen, the editing permits the insertion of only those attributes and relations expected.

With the META editor, we show how it is possible to offer a common view of metadata technologies, hiding the syntactical aspects that inexpert users could ignore. The editor is schema dependent; to create new documents, users must choose a particular schema among those that are provided. The format of new documents depends on the metadata language used to write the schema. The binding information of each document with its respective schema is kept within the document itself.

The following screenshot shows an entity of type Person. It is possible to note its names, the relations it takes part in, and its attributes. The left area of the editor contains an ordered list of all the available entities defined in the loaded document.

Figure 2: META: the editor
[Link to open this graphic in a separate page]

To offer a common interface for the editing of documents in the two formats, we have followed the same guidelines described for the conversion tool. Essentially, we have unified the view of metadata with the concepts of type, attributes, and relations. Adding a new entity to a document implies choosing a type for it. Consequently, all — and only — the attributes and the relations that the schema defined as applicable to an instance of the chosen type are available to the user.

In order to make this approach work, a complete definition of the schema is necessary. In particular, all the attributes and relations have to be constrained to their pertaining types. Thus, type system has to be defined accordingly, exploiting the hierarchical feature it provides.

While RDF Schema provides all the necessary primitives for this task, Topic Maps has not an equivalent constraint language yet. A language called TMCL [Topic Map Constraint Language] is under development by ISO, but no document has been produced yet. A Requirements Document Draft for TMCL has been published in [Pep01]. This document defines the requirements for what TMCL is supposed to provide. In the absence of a standard, we have used the proposal concepts, but have slightly extended them with what we needed to accomplish our task. In fact, this implementation of the constraint language addresses all the issues required to define the applicability constraints of occurrences and associations in a Topic Map schema. The main feature of this implementation is the use of XTM itself to define schemas. Basically, it uses a template mechanism to define the types, the occurrences, and the association pertaining to them.

We can define occurrence types, which represent our attributes, as topics typed with a PSI introduced for this purpose. Then we can define a topic type as a template, specifying in its XTM context all of the occurrence types we may apply to instances of that type.

For association types (i.e., our relations), we can again define topics typed by the PSI that represents association classes, and then define all of the association templates, instantiating them with one of these topics. The members of an association template are interpreted as the topic types that an association instance may have for each role.

With Topic Maps, we are allowed to specify multiple members for each association. In this case, the editor will constrain a user who wants to instantiate this association to specify an instance entity for each one.

META: the navigator

Many search engines provide a catalogue of web pages, indexed according to their content. In this way, a semantic tree can be used to reach the relevant resources. Unfortunately, these catalogues allow only navigation by subjects and cannot express other possible relationships among these documents. In other words, we loose all the semantic connections that cannot be captured by the indexing system but can be expressed using metadata technologies.

Moreover, often the resource content is the final destination of the semantic navigation; the user cannot access other metadata relative to the resource (such as the date, the author, or the bibliography) without opening and reading it. All this information could be contained in a metadata block associated to the document and accessed through a semantic navigator, allowing a richer search and access.

Our main objective in developing the navigation tool has been to provide a uniform view of documents written in RDF and Topic Maps. Since we hide the internal structure of these documents, users are not forced to know the syntax of the specification languages nor the meaning of each XML tag.

The following screenshot shows the navigator browsing the categories of Java languages. We can see three different sections:

  • The main categories defined in the opened document
  • The subcategories of the select main category
  • The available instances for the current category
Figure 3: META: the navigator
[Link to open this graphic in a separate page]

As specified in the previous section (Section “META: the tool for metadata conversion”), we limited the metadata that can be applied to a resource to either an attribute (i.e., a property whose value is a literal or a link to a resource, such as the title of a book) or a relation (i.e., associations among two or more typed entities).

The following screenshot shows the details for a resource of type Person. It is possible to note the names, the relations it takes part in, and its attributes. Clicking on the other members of the relations listed here, it is possible to view the details of the selected item.

Figure 4: META: the navigator. Entity browsing
[Link to open this graphic in a separate page]

Attributes and relations are typed. Every type used for this purpose must have a human-readable name in order to be properly presented to a user. Both RDF and Topic Maps natively support a set of attributes as the type of a resource, its name, and a link to a resource which is the subject of the metadata entity. So, standard XML tags are used to represent these attributes.

The attributes defined in schemas need to be interpreted by the navigator in order to be properly displayed to the user. In most cases, this tool will not know their meaning. For example, if we define the attribute photo for the class person and the attribute cover for the class book, the navigation software should recognize them as links to images and show them as graphic elements. In order to obtain this behavior, it will be necessary to standardize ontologies and use public schemas which define the most common entities (such as the Dublin Core [DCMI]). This way, document management will be even more standardized and the presentation more effective. In order to support this feature, we developed a small set of public subjects which can be used to type attributes that relate to multimedia contents (i.e., images, audio or video resources).

Every RDF property defined in a schema is interpreted as:

  • A relation if, in its definition, the predicate rfds:range has a typed entity (of type rdfs:Class) as value
  • An attribute otherwise
whereas in XTM:
  • Every association element corresponds to a relation among its members. Every association is typed with an association-type topic whose name will used to display it.
  • Every occurrence element of a topic is mapped to an attribute.

Conclusion

At this stage of their evolution, neither RDF nor Topic Maps seem able to overcome the competing language, and it seems likely that the two technologies will coexist in the future of the Semantic Web environment.

Unfortunately, converting from one format to the other is not as painless as it should be. Many natural constructs in RDF have no corresponding structure in Topic maps and vice versa. We have shown that, in many cases, the results of automatic conversion end up being particularly awkward and hard to read.

META, with its collection of three tools for the management of meta-information both in RDF and Topic Maps, seems to be an important step in providing an easy integration of the two languages.

Of course, we have found that conversion is eased by an appropriate choice of constructs in the original language and by the systematic use of schemas whenever possible.

As said before in this paper, we have found it necessary to extend RDF in some ways in order to obtain, in some cases, the same expressive power of Topic Maps; this represents the main disadvantage of META. On the other hand, META is completely transparent to the user, which is especially important for inexperienced ones. Furthermore, using META should be highly beneficial, due primarily to the fact that this set of integrated tools allows users to work in an environment where RDF and Topic Maps are mixed.

To date, the META editor does not provide any facility for specifying schemas, which must be provided in advance. This is due to the fact that neither RDF nor Topic Maps have a stable document for the description and specification of schemas for their languages. We believe that a tool for the visual specification of schemas is an indispensable addition to the current set of tools, and its devolpment is one of our aims for the near future.

Nonetheless, the current set of tools seems to be able to provide much more sophisticated and integrated functionalities than other existing applications.


Bibliography

[DCMI] DCMI, Dublin Core Metadata Initiative, http://dublincore.org.

[Gar01] Garshol, Lars Marius, Topic maps, RDF, DAML, OIL: A comparison, http://www.ontopia.net/topicmaps/materials/tmrdfoildaml.html.

[Gar02] Garshol, Lars Marius, Living with Topic Maps and RDF, 10 March 2003, http://www.ontopia.net/toppicmaps/materials/tmrdf.html.

[GenPir01] Gentilucci, Riccardo, and Marco Pirruccio, Metadata on the Web: on the integration of RDF and Topic Maps, UBLNCS, University of Bologna Technical Report, 2002, in print.

[LacDec01] Lacher, Martin, and Stefan Decker, On the integration of Topic Map data and RDF data, In Extreme Markup Languages 2001, Montreal, Canada, http://www.semanticweb.org/SWWS/program/full/paper53.pdf.

[LHL01] Berners-Lee, Tim, James Hendler, and Ora Lassila, “The Semantic Web”, Scientific American 284, 5 (May 2001), 35-43.

[Moo01] Moore, Graham, RDF and TopicMaps: An Exercise in Convergence, In XML Europe 2001, Berlin, http://www.topicmaps.com/topicmapsrdf.pdf.

[NewBie01] Newcomb, Steven R., and Michel Biezunski, Topicmaps.net’s Processing Model for XTM 1.0, version 1.0.2, 25 July 2001, http://www.topicmaps.net/pmtm4.htm.

[Ogi01] Ogievetsky, Nikita, XML Topic Maps through RDF glasses, In Extreme Markup Languages 2001, Montreal, Canada, http://www.cogx.com/rdfglasses.html.

[Pep01] Pepper, Steve, Draft requirements, Topic Map Constraint Language Requirements, 4 April 2003, http://www.isotopicmaps.org/tmcl/requirements.html.

[Pep02] Pepper, Steve, Draft examples for inclusion in General Requirements and Reccomendations for Published Subjects, 5 June 2002, http://www.ontopia.net/tmp/psi-examples.html.

[RDF] World Wide Web Consortium, Resource Description Framework (RDF) Model and Syntax Specification, W3C Recommendation, 22 February 1999, http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/.

[RDF02] Intellidimension Inc., RDF Gateway, http://www.intellidimension.com.

[RDFXML] World Wide Web Consortium, RDF/XML Syntax Specification (Revised), W3C Working Draft, 25 March 2002, http://www.w3.org/TR/rdf-syntax-grammar.

[TM] Joint Technical Committee 1 JTC1, ISO/IEC 13250 Topic Maps, 3 December 1999, http://www.y12.doe.gov/sgml/sc34/document/0129.pdf.

[XTM] TopicMaps.org Authoring group, XML Topic Maps (XTM) 1.0 Specification, http://www.topicmaps.org/xtm/index.html .



Metadata on the Web

Paolo Ciancarini [University of Bologna, Department of Computer Science]
cianca@cs.unibo.it
Riccardo Gentilucci [University of Bologna, Department of Computer Science]
gentiluc@cs.unibo.it
Marco Pirruccio [University of Bologna, Department of Computer Science]
pirrucci@cs.unibo.it
Valentina Presutti [University of Bologna, Department of Computer Science]
presutti@cs.unibo.it
Fabio Vitali [University of Bologna, Department of Computer Science]
fabio@cs.unibo.it