Easy RDF For Real-Life System Modeling

Thomas B. Passin
pub1@tompassin.net

Abstract

This paper provides an illustrated answer to the question "How can I start creating RDF right now for my real-life problem?" The work is based on lessons learned during a real-life project to capture enterprise architecture (EA) system modeling information during the preliminary analysis phase, when the concepts were fluid and constantly evolving. RDF is shown to be extremely well-suited for this purpose.

To be usable by non-specialists who need to complete such tasks, the emphasis must be on simplicity and flexibility. The data elements need to change and grow over time without breaking the model. The data must be extracted and displayed in many different ways, for different purposes. The key to making RDF practical was to use a carefully defined subset, such that (1) the subset is simple and contains few complex variations, and (2) the RDF can be written in the form of simple, familiar indented lists.

The paper describes such an RDF subset, shows how it was applied to the EA model, introduces a simplified textual format, and illustrates how using this subset of RDF makes it easy to follow the evolution of a model. By use of XSLT transformations, the RDF data is presented in many useful forms, such as data collection templates and class hierarchies, which are illustrated. The paper also demonstrates how to modularise and further simplify the RDF using a DTD and entities.

Finally, the paper discusses how the RDF dataset could be converted into other forms, such as to XMI for import into UML tools, or to a relational database schema.

Keywords: RDF; XSLT; Modeling

Table of Contents

1.0 Introduction
1.1 Advantages of The Recommended RDF-XML Subset
Advantages Of Using RDF
Advantages Of Using RDF-XML Syntax
1.2 RDF Strengths
1.3 Background Of This Work
1.4 What This Paper Covers
2.0 RDF And System Modeling
2.1 The Real World Of Modeling
2.2 Typical Modeling Tools (And Their Problems)
2.3 Why RDF Could Be Useful
2.4 Hindrances To RDF
2.5 What's Needed
2.6 This Paper's Method
2.7 The Example Model
3.0 The Practical RDF Subset And How To Format It
3.1 Indented Lists
Conventions for Structured Text
The use of rdf:about vs. rdf:ID
3.2 Striped Format
3.3 Typed Nodes
3.4 Blank Nodes and rdf:Resource
3.5 Identifiers and Namespaces
3.6 Identifiers and XML Entities
3.7 Classification And OWL
3.8 A Working Example
3.9 Displaying The Model
3.10 Using A Validator
3.11 Using Modules
3.12 Example Of A Complete Set of Files and Modules
4.0 How RDF Adjusts To Evolving Models
4.1 Typical Kinds Of Evolutionary Changes
4.2 Renaming
4.3 Adding And Splitting Structure
4.4 Reclassifying
4.5 Accidental Duplication
5.0 Case Study - Populating And Evolving The Model
5.1 Example: Funding Profiles
5.2 Discussion
6.0 Transforms and Templates For Presenting The Model
6.1 Data Presentation - Essential For Communication
6.2 XSLT For Extracting and Formatting
6.3 Data Collection Templates
6.4 Examples
6.5 Graphical Displays
7.0 Using The Data
7.1 After The Model Stabilizes
7.2 Staying With RDF
8.0 Limitations and Alternatives
8.1 Limitations
8.2 Alternative Approaches
9.0 Conclusions

Thomas B. Passin

Thomas Passin has been working with XML-related technologies since 1998. He helped to create the XML version of the message set in SAE J2354 Advanced Traveler Information Systems, and has created a number of demonstration applications that use XML, XSLT, and Python technologies together. He also consults at work about XML and XSLT matters, and is active on a number of related discussion lists.

He is the author of the book "Explorer's Guide To The Semantic Web".

Mr. Passin studied physics at the Massachusetts Institute of Technology and the University of Chicago.

Easy RDF For Real-Life System Modeling

Thomas B. Passin [Principal Systems Engineer; Noblis]

Extreme Markup Languages 2007® (Montréal, Québec)

Copyright © 2007 Thomas B. Passin. Reproduced with permission.

1.0 Introduction

The Resource Description Framework (RDF) is an attempt to provide a language that can play the role of a standard distributable knowledge representation medium. Much as the Internet integrates disparate networks by using a common protocol to connect their interfaces, and the World Wide Web integrates a wide range of resources by using the HTTP protocol, the HTML page description language, and hyperlinks, RDF in principle could serve as an integration medium for data.

This paper presents an initiative to use RDF in actual projects - as opposed to mere demonstration efforts. The immediate background was the modeling of a large and rather amorphous enterprise architecture (EA).The work reported here represents one approach to making the use of RDF tractable for one or a few individuals, on a relatively small scale. Indeed the use of RDF turns out to have a number of definite benefits, especially because its flexibility matched the fluid nature of the EA modeling in its early stages.

This paper assumes a modest knowledge of the RDF data model and XML syntax. For an introduction to RDF, see chapter 2 of [Passin 2004] or [W3C RDF Primer 2004]. Some RDF technicalities are discussed, but the emphasis is on particular ways to structure and use RDF, such that RDF can be a real asset for real projects. A high-level acquaintance with [OWL] would be helpful but not essential. The work emphasizes a particular subset of RDF-XML, a structured text manner of writing it, and the use of XSLT transformations for creating display views of the data.

1.1 Advantages of The Recommended RDF-XML Subset

This section summarizes the reasons why the recommended subset of RDF, and the use of XML, are especially advantageous. The subset itself is described and illustrated in Section “3.0 The Practical RDF Subset And How To Format It”.

Advantages Of Using RDF

The recommended layout of the RDF (a variety of what is sometimes called striped) closely mimics the way many people tend to note down information - as indented lists. The brain naturally seeks to arrange information in a hierarchical manner (even when it may not be truly hierarchical), so using this approach is a good fit to natural human abilities.

Data can be added to the dataset without regard to whether all the proper definitions have been made beforehand. RDF processors will infer the existence of entities (i.e., resources) from any references to them. An OWL-aware processor might even be able to infer most of the class structure, were it left implicit.

Changes are easy to make by using a text editor. If an identifier needs to be changed globally, a global search-and-replace will do the job. If the name (as opposed to its identifier) needs to be changed, it can be changed on one place, the rdfs:label statement where the name is established.

As will be seen, the ease in making changes, together with the ability of RDF processors to infer the existence of entities that have been referenced but not yet defined, make RDF very well suited to track the dynamic, changing character of the underlying models (such as enterprise architecture models) as they evolve.

Advantages Of Using RDF-XML Syntax

Although RDF's XML syntax is often considered a liability, for this work it has advantages over alternative, non-XML syntaxes (e.g., NTriples or Turtle). The subset recommended here can be written, as will be seen, to be nearly isomorphic to naive textual indented lists, and this is quite important to this approach.

XML comes with machinery for splitting apart the data into separate modules, which simplifies the creation, evolution, and maintenance of a dataset as it grows. XML comes with the capability for defining shortcuts - the namespace prefixes and entities, the default namespaces and xml:base - that greatly simplify typing and reading the data. Other syntaxes may have some of these benefits, but none has them all.

When the data is in XML form, it is possible to use XSLT to extract subsets of the data and to format it for display. As will be seen in Section “6.0 Transforms and Templates For Presenting The Model”, this capability is extremely useful. Note that XSLT is a practical tool only because the approach uses a highly restricted subset of the full RDF-XML syntax. The full syntax has so many optional variations that it is extremely difficult to write suitable XSLT transforms. Also, the hierarchical form allows the transformation author to make certain simplifying assumptions.

Finally, this paper presents a simplified non-XML format that mimics the naive text indented list form even more closely, is isomorphic to the RDF-XML subset, and is easily transformed into the RDF-XML form advocated here. This syntax is in essence a structured text format for creating the dataset. By using it, one can have all the advantages described above for using XML, and simplify the appearance of the data even more. Even without this simplified syntax, the approach described here is still valuable.

1.2 RDF Strengths

RDF, aside from the particular subset recommended in this paper, can represent virtually any kind of model one is likely to come across. Enterprise Architecture diagrams, conceptual data models, organization models, and the like are usually represented graphically with boxes and lines, and all can be modeled with RDF.

RDF representations are basically equivalent to highly normalized relational data models, but unlike relational tables, no tables have to be designed and constructed before data can be stored. Thus, the use of RDF brings considerable flexibility, since data can be stored as the model develops. This is extremely beneficial when the representation task is not well known, so that the model is rapidly evolving.

RDF uses universal identifiers. While other approaches, such as relational databases, could do so, it is not common practice. And many if not most modeling tools construct their own non-universal identifier (The EA modeling tool Metis is an exception. Not only does it use an RDF-like triple model, but it uses URIs to identify its entities).

With RDF, not all entities (resources in RDF-speak) need to be identified. Some, the so-called bnodes, can remain anonymous. This too leads to flexibility and potential ease of use, since the user does not have to devise arbitrary identifiers for them.

RDF can make use of OWL, the W3C's Web Ontology Language, to define its vocabulary. This brings with it the ability to make use of other vocabularies in an integrated manner.

Not a trivial benefit, parsers and validators for RDF are freely available.

1.3 Background Of This Work

The work described here arose from attempting to use RDF in ordinary work situations. This thread of effort has spanned a number of projects, including a rudimentary skeleton for the description and autodiscovery of networks of information repositories. Most recently, the author was tasked to describe the existing EA of a large, amorphous, poorly defined system of systems. As the work progressed, the customer requested a set of templates for collecting information to populate the architecture.

It was necessary to extract the relevant entities and properties from the modeling diagrams that had been manually developed, to present the information as some kind of "template", and to have a ready way to accumulate and work with the flood of populated templates that was anticipated to arrive in due course. Since the diagrams were rather high level, they did not contain all the real-world properties and relationships that would need to be captured. Hence the need for a simple, flexible and expandable approach. Since modeling tasks with similar characteristics are common, in disparate areas including conceptual database modeling, high level object modeling, process modeling, and the like, the results should be useful in many situations.

1.4 What This Paper Covers

Section “2.0 RDF And System Modeling” discusses real life issues in modeling tasks such as the EA task sketched above, how RDF could be useful, and outlines the RDF-based system that the author developed. Section “3.0 The Practical RDF Subset And How To Format It” presents a simple, workable subset of RDF - more properly, of RDF-XML syntax - that is the basis for the approach.

Section “4.0 How RDF Adjusts To Evolving Models” covers the typical evolution of a model, and how the use of RDF is especially beneficial here. Section “5.0 Case Study - Populating And Evolving The Model” walks through examples of populating and evolving an actual model, which is based on the actual task, although simplified for the purposes of exposition. Section “6.0 Transforms and Templates For Presenting The Model” shows how the data may be transformed into a variety of presentations, including templates for collecting further data. Section “7.0 Using The Data” considers how the modeling data might be used after the model has stabilized and partially populated, including how it could be used by other modeling tools. Section “8.0 Limitations and Alternatives” contains concluding remarks.

2.0 RDF And System Modeling

For the purposes of this paper, the terms modeling and system modeling are taken to include common conceptual modeling activities, such as modeling EAs and system architecture, high level system engineering models, conceptual database design, high level UML modeling, and the like.

2.1 The Real World Of Modeling

Early in a project, high level modeling takes place when the basic concepts are not yet firmly established. Indeed, the problem space is likely not to be well understood. Understanding comes with the work of modeling and attempting to populate the models with relevant information. So the models themselves are prone to change. In many modeling tasks, the way in which the highest levels are organized may be constrained by previously publicised reports, and even subject to political pressures. The models must then fit the required forms, yet be somehow translatable into proper engineering forms.

In many projects, work in progress must be repeatedly displayed, either to other engineers, to management, or to customers and other high level interested parties, even though the work is in a state of flux.

Later, as the work progresses, there is often an effort made to populate the models with such data as can be obtained or generated. Invariably, these attempts uncover weaknesses or mistakes, which in turn require to model to be adjusted, extended, or refactored. Ultimately, the models stabilize, and become the basis for further levels of engineering development. At this stage, a moderately large populated EA model might come to contain some thousands or tens of thousands of items (more for larger enterprises), many of which would be related together in numerous ways.

This paper is especially concerned with assisting the early, fluid phase of work. During this phase, even the basic ontology may be uncertain and incomplete.

2.2 Typical Modeling Tools (And Their Problems)

Modeling tasks like these can be supported by a number of software tools. The best known are probably the database design tools, such as ErWIN. These tools have been highly developed through many years. They can consume and produce a number of flavors of SQL to construct database schemas, and can even interact directly with databases to create the tables, keys, and constraints required to implement the design. For database design jobs, these are excellent, though specialized tools. We will not be concerned with them, except to note that it would be very good to have affordable tools that are as effective for more general purpose modeling.

Unified Modeling Language (UML) tools exist, such as Rational Rose, and some of them are very capable. There is an XML-based interchange language, XMI, that such tools normally support. These tools are usually intended for the design of object oriented software. They can be used for other purposes, but they are usually less than ideal. It is fairly easy to change a design before it gets too complex. It is less easy to make significant changes after a model becomes somewhat populated.

One is also generally limited in the types of data presentation that can be generated. To do more, one could export the data as XMI and process it further, but XMI files tend to be complex, hard to understand, virtually impossible to create or modify by hand, and contain a lot of information that is not needed for high level modeling.

As a result, UML tools tend to be unsatisfactory for many modeling tasks, such as EA modeling. Also, the most capable tools are very expensive, tend to require extensive training, and contain far more features than are needed for the level of modeling under consideration here.

There are also some software packages specifically intended to support EA modeling, such as Metis or System Architect. These again tend to be very expensive and to require extensive training. They also usually store data in proprietary formats.

With all the software tools available, some of which were mentioned above, the fact is that many high level models, possibly most, are created in drawing and presentation programs like PowerPoint or Visio, and data is captured using spreadsheets. This is often the easiest way to get started, required the least expense and training, and the tools are the most familiar to a large number of people. However, it is very hard to do anything with these graphical models once they have been created.

Finally, it often happens that either the models, or the collected data, reside in a word processing document. In principle, it can be possible to extract the data using carefully written macros or external programs. However, the success of this approach depends on the document being structured or styled consistently. Such consistency has been found to be difficult to obtain.

2.3 Why RDF Could Be Useful

RDF is unusually flexible because terms and concepts do not have to be defined beforehand. If something is referenced in an RDF dataset, by implication it exists as a concept that can be used. Ontology declarations, types, labels, and so on can be added at any later time, and do not have to be inserted into any specific place in the data.

Identifiers are uniform across the entire data set, so that the data can be broken up into modules or sections and still maintain any cross-references.

If the data is actually stored an a serialized RDF file, many global changes can be made using textual search and replace operations. This can be a real advantage during the fluid phase that is the milieu of this paper.

2.4 Hindrances To RDF

The perennial lack of tools, that are easy for non-specialists to understand and to use for their ordinary tasks, continues to be an issue. It can be difficult for managers and modelers to understand why RDF would offer any benefits, and there is a definite learning curve if they decide to try. RDF syntax is also perceived as difficult, although non-XML formats such as Turtle and possibly N3 may be helpful here.

However, there is a larger issue. There needs to be an approach for using RDF, essentially, a process model, so that would-be users can understand how to proceed, and then what to do with the results. The present work is an attempt to provide such an approach. The author used an earlier version to accomplish a real-world task that would have been awkward and much more time-consuming if done some other way.

2.5 What's Needed

A list of requirements - or at least, of wishes - is easy enough to put together, based on the foregoing:

  1. A data format that looks like what a (non-specialist) person would write informally.
  2. A way to easily and reliably turn the above text into RDF.
  3. Easy ways to display the data, or subsets of the data, for a range of purposes.
  4. The ability to make local and global changes with minimal effort.
  5. It should be possible to have undefined or undescribed entities in the dataset, so that elements can be added even when their characteristics are not known.
  6. It should be possible to reclassify entities, even after instance data has been entered.
  7. It should be possible to extract the vocabulary (ontology) for study or for presentation to others.
  8. It should be possible to create data entry templates, to be used to guide data collection workers.
  9. It should be easy to add new properties when they are discovered as the model evolves, and even during data collection activities.
  10. It should be feasible to take data collected using the (paper) templates and get it into the data set, preferably with as little human massaging as possible.
  11. It should be feasible to transform the dataset into other forms suitable for import into established tools, such as UML modelers, EA programs, etc.

2.6 This Paper's Method

The work described here would score perhaps 80% if rated against the wish list in Section 2.5. It does require one person to set up the system. That person needs to have some knowledge of RDF, and preferably of XSLT transformations, since they are used to format the data for display and to extract subsets. Once a basic model has been sketched out - simply to have a starting point - it can be captured and extended. Templates can be created automatically and farmed out to non-specialists to collect instance data. All these steps will be illustrated.

2.7 The Example Model

Figure 2.1 is a simple graphic showing a small part of a putative high-level EA model. It was extracted from a much larger model developed for an actual project. The example is deliberately kept small so it can be illustrated and evolved in the paper.

Figure 1: The Example EA Model
[Link to open this graphic in a separate page]

In the graphic, the arrowheads show which way to read the verb phrases. The relationship lines are labeled with more than one phrase, but separate lines are not shown for each type of relationship so as to keep the diagram simple. For example, an Organization (may) manage a Program, and also (may) fund a Program.

This graphic may seem overly simple and imprecise, but it is typical (though just a fragment) of models in the early phase of development. This is just the phase that the present RDF approach is well suited for.

3.0 The Practical RDF Subset And How To Format It

This section covers a subset of RDF that is fairly easy to write, and is nearly isomorphic to way in which many people naturally write down structured data. For reasons that will be made clear, the RDF subset is expressed using standard W3C RDF-XML serialized syntax. Also covered is a structured text format that can generate the RDF-XML subset. This structured text format is optional - the approach works well even without it - but useful for reducing typing and better readability.

3.1 Indented Lists

Most people find it natural to write down structured information in the form of an indented list. Graphical diagrams are popular for showing high level structure, but the more there are repetitive data items, and the more detailed the information, the greater is the preference for the list form. Here is an ordinary shopping list:

Fruit
   apples
   bananas

milk
eggs

Paper products
   paper towels
   napkins

Suppose that we want to write down some data that will be used to populate the model shown in Figure 1. Here is a fairly natural way to capture some basic information about an Organization named the Office for Federated Systems:

Organization
  org-id:OFS
  name:Office for Federated Systems
  member of:Federated Umbrella Organization

This fragment of instance data fits the part of the graphic that declares that an Organization may be a member of another Organization. One can imagine a list like this captured in a word processor, text editor, or spreadsheet. In passing, notice that we had to add a name (and also, in this case, a made-up identifier, the org-id property), even though these properties do not appear in the high-level diagram. Without the name or id, we would have no way to know what entity we were talking about. The addition of detail to the model like this foreshadows the real-world evolution of a model that will be discussed in Section “4.0 How RDF Adjusts To Evolving Models”.

The RDF subset and syntax have been chosen to closely mimic indented lists like the samples above. Here is the previous example, showing the structured text form on the right:

      Naïve List                                      RDF as Structured Text
Organization                                      Organization::
  org-id:OFS                                        rdf:about::OFS
  name:Office for Federated Systems                 rdfs:label::Office for Federated Systems
  member of:Federated Umbrella Organization         member-of::
                                                      rdf:resource::Federated-Umbrella-Organization 

The structured text can be converted to RDF-XML by a simple program. We show the result, and then explain the formatting conventions.

     RDF as Structured Text                             RDF in XML Format
Organization::                                      <Organization
  rdf:about::OFS                                      rdf:about='OFS'>
  rdfs:label::Office for Federated Systems            <rdfs:label>Office for Federated Systems</rdfs:label>
  member-of::                                         <member-of
    rdf:resource::Federated-Umbrella-Organization       rdf:resource='Federated-Umbrella-Organization'/>
                                                    </Organization>

Conventions for Structured Text

  1. Indentation is significant. All lines that are indented the same amount under another become child nodes of the parent node in the resulting XML.
  2. Element and attribute names are terminated by a double colon (::).
  3. When an element or attribute has a literal string value, that value may appear on the same line (after the double colon), or on the next line (the value must be indented farther than its parent).
  4. All element and attribute names must appear indented on a new line under their parent.
  5. XML attribute names are prefixed with the ampersand character (@). However, three special RDF attributes are recognized as attributes even without the leading @ character. They are rdf:about, rdf:resource, and rdf:parseType.
  6. The transformation is not namespace-aware. So colons in element names are not treated as special. They are simply copied to the output.
  7. The transformation is not aware of XML entities. So strings like &arch;big-corp are copied to the output as is. &arch; therefore will appear as an XML entity in the XML output.
  8. A line whose first non-whitespace character is a semicolon (;) is considered a comment line, and is ignored. The comment does not appear in the XML result.
  9. A line whose first two characters are ;: is copied to the output as is, at the time it is found by the translator. This feature is intended to pass through the XML declaration and the DTD verbatim.

The translation of structured text into RDF-XML was done using a Python program written by the author. The structured text format has proved to be pleasant to type and reasonable to proofread.

Comparing the three versions - the informal list, the structured text, and the RDF-XML - we see that they are nearly identical, except for a small amount of structural overhead, more for the XML version.

The use of rdf:about vs. rdf:ID

The reader who is more familiar with RDF may notice the consistent use of the rdf:about construction instead of rdf:ID, which would be an alternative means of identifying a resource. rdf:ID has two disadvantages which together rule against its use in this application. First, in any dataset, there can only be a single rdf:ID attribute with a given value. But in the relatively informal way of working, it would be easy for a person (or persons, if several were working on the same project) to accidentally insert the same rdf:ID value in more than one place. Second, when rdf:ID is used, an RDF processor will require that a "#" character be inserted into the relative URI. This may conflict with the design of the project's URIs, and the way in which they are constructed from base URIs and default namespaces. Using rdf:about consistently avoids these problems, and is understood perfectly by any RDF processor.

3.2 Striped Format

The basic structure of striped format is this:

thing
   property:value
   property:value
thing
   property:value
   property:value
thing
   property:
      thing2
         property:value
         property:value

The key point is that properties and their values alternate. Note that the "value" of a property can be another object, or else a simple value, which we will take to always be a string of characters. If the value is resource (instead of a literal), then it may in turn have its own properties listed. The block starting with "thing2" is an example of this recursive character.

The ability for striped format to continue recursively is important for our purposes, because it gives us a way to add new structure to objects that have already defined. This will be illustrated later. An alternative way to write RDF is as a flat, or independent, series of statements (also called triples). For example, the N-Triples notation (a non-XML format), is inherently flat, and is not suited for an indented approach. But the flat approach does not mirror the recursive indented list, which seems to be so natural for people.

3.3 Typed Nodes

In RDF-XML syntax, there are several ways to add properties to a resource (i.e., an object, entity, or thing). In the present approach, we insert an element whose identifier is the kind of object we are talking about. Then an RDF processor will create a corresponding item in the dataset, if it doesn't already exist, and assign it the type named by the element. For example, to insert a resource of type Org takes no more than the following:

<Org>
<!-- optional properties go here-->
</Org>

Here a resource will be added of type Org. We've given it no identifier, so it will have none in the RDF graph. (No namespace is used in this example: we assume a default namespace has been defined elsewhere). This is typical of RDF: to name something is to bring it into existence, at least within the RDF dataset. Written in this way, a resource is automatically typed, hence the phrase typed node.

We would normally give the resource a global identifier by constructing a URI for it, and including it in the declaring element:

<Org rdf:about='org-1'>
<!-- optional properties go here-->
</Org>

In structured text format, this would be

Org::
   rdf:about::org-1
   ; more optional properties go here

Here, the identifier is "org-1". (Once again, only part of the identifier is shown: we assume a default base for the URI has been defined by declaring an xml:base value elsewhere). It is perfectly valid to have several Org elements with the same rdf:about value. They all refer to the same resource. In this way, we can step outside of the striped format and add new properties to a resource anywhere in the RDF document that we like. Normally, though, it is more convenient to keep them all in one place. This practice helps people keep track of everything in the model.

In a similar way, a property springs into existence when its type (in this case, its URI identifier) is used in an RDF statement. It is not necessary to define properties anywhere else; if it is used even once, it exists so far as the dataset is concerned. You might want to do define the property type, if only to assign it a nice, readable name, but you don't have to.

In the example shown in Section 3.1, there is a member-of property whose value is rdf:resource='Federated-Umbrella-Organization'. Two points are of interest here. First, our dataset has not (yet) declared anything about the member-of property type. An RDF processor will be able to use it anyway, so we can wait until later (or never) to add more information about it.

Second, rdf:resource is the way we reference another resource - we refer to its RDF identifier. In this case, no resource with that identifier has been declared yet. Once again, an RDF processor will have no problem; it will simply create a resource in the RDF dataset with that identifier. That resource would not have a type assigned. However, we could add that bit of information later. In the meantime, our dataset is perfectly usable as it is.

Except for the use of the identifiers rdf:about and rdf:resource, this RDF fragment looks remarkably like the informal list original.

3.4 Blank Nodes and rdf:Resource

The example in Section 3.2 shows a further level of breakdown in the last "thing". It is common to need to structure a property. One well-known example is an address. We would tend to write an address like this, if we didn't want it to just be a long, undifferentiated string:

person
   name:John
   address
      street:1234 5th Ave
      city:New York
      state: NY
      zip:10003

The problem here is that one property (e.g., street) is directly indented under another (address). But in striped format, a property has to be followed by either a literal value or a typed resource (one that is not a property). In RDF, this is normally handled by inserting a resource for the entire address. Here's how it would look:

address
   Address-Object
     street:1234 5th Ave
     city:New York
     state: NY
     zip:10002

Now the entire structured address is represented by a node or resource, of type Address-Object. Typically, such subsidiary resources are not given their own identifiers. They are instead identified by their parent resource. In a relational database, they would be in a dependent table, as opposed to an independent one. Resources without their own global identifiers are called anonymous nodes, blank nodes, or bnodes.

In fact, these bnodes don't necessarily even have to be typed. So we don't have to assign a type like Address-Object if we don't want to.

RDF-XML syntax has a minor shortcut for creating bnodes. An attribute, rdf:parseType='Resource', is added to the element for the property. So our example looks like this:

    RDF-XML version                          Structured text version
<person                                 person::
   rdf:about='john'>                       rdf:about::john
   <address                                address::
       rdf:parseType='Resource'>              rdf:parseType::Resource
      <street>1234 5th Ave</street>           street::1234 5th Ave
      <city>New York</city>                   city::New York
      <state>NY</state>                       state::NY
      <zip>10002</zip>                        zip::10002
   </address>
</person>

When an RDF processor sees this shortcut (the rdf:parseType='Resource'), it inserts a bnode into the dataset, and all the subsequent properties in that block get attached to it. This is just what we need, since our subset of RDF requires us to have a strict alternation between properties and things.

This shortcut lets us stay with our natural indented-list-like format. The use of this shortcut is unfortunately one new thing for a person to learn, but it is fairly minor. It is harder to a non-modeler to understand when something is a property and when it should be an entity (as in this case, where both address and street are property names, and imply an entity located between them). But a modeler who does understand the distinction can either publish some examples beforehand, or fix up incorrect entries later, before processing them into the RDF dataset.

3.5 Identifiers and Namespaces

Up until now, our examples have been simplified by using default namespaces and an implied xml:base. The latter supplies the first part of relative URIs in attributes in an XML document, so one does not have to write them out every time. This section shows how to declare these useful things, and how to make use of more than one namespace yet still minimize typing of long identifiers.

To do this, we make use of some of the strengths of XML, namely, the ability to declare aliases for namespaces and for arbitrary strings. We start with a single namespace. We want all of our identifiers to start, let us say, with http://www.example.com/passin/extreme2007/. This is the "namespace" for the URIs we are using as identifiers. Thus, our Organization is really http://www.example.com/passin/extreme2007/Organization.

We certainly don't want to type names like that all the time, and they would make it hard to read and to check the data. In XML, to use a namespaced identifier for an element name, one declares a prefix to represent the namespace part of the name. If we define our prefix to be example, our element name becomes example:Organization. This is much more practical. If we also define the default namespace to equal our namespace, then we can dispense with the prefix altogether, and get back to the simple, clean form used in the examples up until now.

We can declare the namespace, and the default namespace in the root of the document, which in our RDF documents will be rdf:RDF. We also define the standard rdf and rdfs prefixes.

<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns:example='http://www.example.com/passin/extreme2007/'
>

To make our namespace be the default namespace, we write

<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns='http://www.example.com/passin/extreme2007/'
>

; In structured text format
rdf:RDF::
   @xmlns:rdf::http://www.w3.org/1999/02/22-rdf-syntax-ns#
   @xmlns:rdfs::http://www.w3.org/2000/01/rdf-schema#
   @xmlns::http://www.example.com/passin/extreme2007/

This lets us use our simple element names without a prefix (we can include both forms at once, if we like, prefixed and default).

It is a little more involved to simplify attribute values that are URIs that identify resources. First, let's look at a typical value from the example. In the fragment <Organization rdf:about=' OFS'>, the entire identifier would be http://www.example.com/passin/extreme2007/OFS. Namespace prefixes can't be used in attribute values. So we turn to XML entities.

3.6 Identifiers and XML Entities

In XML we can declare entities in the DTD, or Document Type Declaration. The term entity as used here is distinct from its use in modeling as a substitute for object, thing, or resource. If we declare an entity named example, then we can use it by writing &example; , and an XML processor will insert the entire string value. Thus, our OFS identifier could be written as &example;OFS.

To declare this entity, we insert a small DTD into the document before the root element:

<!DOCTYPE rdf:RDF [
   <!ENTITY example "http://www.example.com/passin/extreme2007/">
]>

If we only need one of these entities (we will when how we might want more in section 3.12), we could avoid the DTD altogether, and rely on the xml:base attribute. When this attribute is present, all relative URIs in attribute values get prefixed by the value of xml:base. This means that our URI identifiers can be written in the simplest, most readable way Here is how to incorporate xml:base:

<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xml:base='http://www.example.com/passin/extreme2007/'
>

With xml:base included, every time we write the attribute value OFS, the XML processor will actually insert the full value of http://www.example.com/passin/extreme2007/OFS.

To sum up, the use of these XML entities lets us write simple RDF fragments without long URIs, which are tedious, easy to make an error with, and hard to read. When should we skip the entities and just use xml:base, and when should we define and use entities? It depends on whether you plan to use more than one namespace in your identifiers. If there will be only one (of your own - not counting the rdf, rdfs, owl, etc., namespaces), then declare it as the default namespace, and make it the value of xml:base.

If you are going to use more than one namespace of your own - which might be because of the way you plan to modularize your model, or to separate the ontology from instance data - then define one entity and one namespace prefix for each namespace. It is best to use the same name for the entity and the corresponding prefix. For example, if we use arch and our namespace is http://example.com/passin/extreme2007/, then declare an "arch" prefix and an "arch" entity, each having the namespace string for their value. Then our elements will look like <arch:Organization> and our identifiers will look like &arch;ABC-Corp. This is reasonably easy to remember in practice.

Finally, if you do define a number of prefixes and the corresponding entities, it is still a good idea to make the most common namespace be the value of the default namespace and of xml:base. That way, at least the most common one won't have to be written explicitly.

3.7 Classification And OWL

To sketch out a model, and to populate it with instance data, it is often not necessary to explicitly define a vocabulary. For our example, we can use Organization perfectly well without making it some kind of a class. However, there can be benefits to including an ontology, even if it is rudimentary. For one thing, we can use it to define nice, readable, labels for our entity types. More usefully, we can use an ontology to extract just the classes and their properties from the instance data. We could do this anyway, but having an ontology makes it easier.

Since we are planning to use RDF, it makes sense to use [OWL] for declaring classes and properties. All we have to do is to add the owl namespace to the rdf:RDF element, and then create an owl:Class element for each entity we plan to work with, like Organization. We can use owl:ObjectProperty for properties (with this style of modeling, we probably won't need to use owl:DatatypeProperty, but it's there if we want it). We can also declare subclasses, if that would seem to be useful. Recall that OWL makes use of [RDFS], the RDF Schema language, for certain relationships such as subClassOf. So we have to add the RDFS namespace to the rdf:RDF element, too. Because OWL and RDFS are subsets of RDF, we can use the same format for OWL and RDFS statements as we have already been using. Here is an example:

<owl:Class rdf:about='ObservingSystem'>
    <rdfs:label>Observing System</rdfs:label>
    <rdfs:subClassOf rdf:resource='System'></rdfs:subClassOf>
</owl:Class>

; Or, as structured text
owl:Class::
   rdf:about::ObservingSystem
   rdf:label::Observing System
   @rdfs:subClassOf::
      rdf:resource::System

   ; Note that we need to use the "@" character, because
   ; rdfs:subClassOf is not one of the three special
   ; attributes that are predefined in the translator.

Here, we've declared the class of ObservingSystem, and said that it is a subclass of the class System. Once again, RDF behaves in its usual flexible way - if we forget to declare the System class, an RDF processor just goes ahead and sticks it into the dataset anyway. Of course, the processor wouldn't know that System is an OWL class, but it would know that ObservingSystem is a subclass of System.

Here is an example of declaring a property as an OWL ObjectProperty. The main value, at this phase of modeling, is for providing a nice, readable label for the property, and for making the design of the stylesheet for listing all the properties easier. Later, once the model fleshes out and stabilizes, it might make sense to add more features to the property. These features might include a domain and range, but at this stage, adding them would be of no value (not only have we no need for domain and range at an early stage of development, but we probably aren't sure yet what values they should have).

<owl:ObjectProperty rdf:about='member-of'>
    <rdfs:label>member of</rdfs:label>
</owl:ObjectProperty>

; Or as structured text
owl:ObjectProperty::
   rdf:about::member-of
   rdfs:label::member of

   ; Note that the rdfs:about value is a URI and had best
   ; not include spaces. rdfs:label defines a
   ; readable name (which here does contain a space).

Here we make use of the standard property rdfs:label for declaring readable names (i.e., labels).

3.8 A Working Example

This section presents a more fleshed-out example. It is adapted, with changes, from an actual project. It expands on the model fragment depicted in Figure 1 in Section “2.7 The Example Model”. It contains a small ontology that declares a few OWL classes and properties, taken right off the diagram of the model. Then it continues with a little instance data. In Section “5.0 Case Study - Populating And Evolving The Model”, we will show how using RDF supports us as we extend and evolve this fragment of the model.

This fragment uses the striped RDF-XML syntax discussed in Section “3.2 Striped Format”, expressed in the simpler structured text format presented in Section “3.1 Indented Lists”. The DTD and the enclosing rdf:RDF elements are omitted to shorten and simplify the fragment.

; A little OWL ontology declaring classes and properties
owl:Class::
   rdf:about::Org
   rdfs:label::Organization
owl:Class::
   rdf:about::Program
   rdfs:label::Program
owl:Class::
   rdf:about::DataSystem
   rdfs:label::Data System
   @rdfs:subClassOf::
      rdf:resource::System

owl:Class::
   rdf:about::ObservingSystem
   rdfs:label::Observing System
   @rdfs:subClassOf::
      rdf:resource::System

owl:ObjectProperty::
   rdf:about::member-of
   rdfs:label::member of
owl:ObjectProperty::
   rdf:about::partof
   rdfs:label::Part Of
owl:ObjectProperty::
   rdf:about::allied-with
   rdfs:label::Allied With

; Here is a bit of instance data
Org::
   rdf:about::ofs
   rdfs:label::Office for Federated Systems
   acronym::OFS
   homepage::http://www.example.com/ofs
   member-of::
      rdf:resource::fed-umbr-org
   related-org::
      rdf:parseType::Resource
      role::
         rdf:resource::fundedby
         org::
            rdf:resource::lga

Org::
   rdf:about::fed-umbr-org
   rdfs:label::Federated Umbrella Organization
   acronym::FUO

Several lines have been highlighted here to point them out for discussion. First, the two classes Observing System and Data System have been declared to be subclasses of System, but System itself has not been declared. This is fine in RDF; later, we can add a declaration that it is an OWL class, if we like. Note that System does not occur in the graphic base model; it has been added because it seemed obvious that the two xxx-System classes ought to be subclasses of some common parent class.

Next, the use of rdf:parseType='Resource' is highlighted. This practice was covered in Section “3.4 Blank Nodes and rdf:Resource”, It is used here because there is structure beneath the related-org property. This relationship is not modeled as a complex one in the original diagram. But the modeler suspects in this case that there will be more to say about the relationship. By turning the relationship into a thing to be talked about (that is, a blank or unidentified node), this perceived future need can be accommodated.

Finally, in the last highlighted line, a new resource, lga, is referenced. This resource does not exist anywhere else in the dataset, due to a (pretend) oversight. The modeler has forgotten that lga is yet undefined. But in RDF, as has been stressed many times now, this is quite all right. It can be defined somewhere else, at a later time. In the meantime, the RDF is perfectly correct and an RDF processor will understand what is meant.

These references to missing definitions have been included to emphasize the flexibility of RDF, which in part makes it so suitable for this fluid phase of modeling. Ironically, many people seem to think that RDF is rigid, complex, and that it nearly requires the use of predefined ontologies. Nothing could be farther from the truth.

3.9 Displaying The Model

Section “7.0 Using The Data” covers the display of the RDF dataset in ways that would be useful for typical modeling tasks. Here we mention that the use of the subset of striped syntax presented in this paper is very amenable to the use of XSLT transformations to present different views of the data. For the general RDF syntax, the use of XSLT is very difficult because of all the different ways the same information can be written.

The dataset can also be visualized using any of the available RDF visualizers (such as the W3C RDF Validator, discussed in the next section), but when the dataset starts to become larger, these tend to become less useful. When there is too much data, or too detailed a data structure, for a graphic visualization to work, the best kind of display is usually some kind of indented list, with different parts grouped according to the intended use of the display.

To anticipate the later discussion, we include here an example of a listing of the model, albeit in a different state of development. This screen capture shows all the instances in the model, grouped under their respective classes. It was generated by an XSLT transformation operating on the RDF dataset.

Figure 2: One Way To Display A Model's Instance Data
[Link to open this graphic in a separate page]

3.10 Using A Validator

Even though the title of this paper uses the word "Easy", getting the details correct for both the striped form and the construction of the declarations needed in the DTD can be tricky, what with entity declarations, xml:base, and the like. Once one has worked them out, it all seems easy enough. Following the examples and instructions in this paper will prevent most of the problems. Nevertheless, there is no substitute for using an RDF validator to make sure that the modeling constructs do what is wanted, and that the RDF-XML is well-formed in the XML sense.

This paper recommends the online RDF validator maintained by the W3C. It can be found at http://www.w3.org/RDF/Validator/. This validator not only checks the XML and the RDF, but also displays the resulting RDF model. This is valuable because in some cases the RDF processor needs to add structures to the dataset, and it is well to make sure one understands what will be inserted. Also, seeing the results gives one a chance to make sure what the dataset is defining, in RDF terms.

The validator can be set to show a complete list of all statements (triples) in the model, to show graph of the model, or both. The author's recommendation is to select just the graphical view.

Here is the graph produced by the W3C Validator for the model RDF fragment shown in Section “3.8 A Working Example”. It has been reduced in size to make it visible in one screen width. This has made the text unreadable, but the reduced size shows at a glance the kind of display available from the validator. Note that to have the validator process the example, it was necessary to declare prefixes for the RDFS and OWL namespaces, and both xml:base and a default namespace had to be declared.

Figure 3: Graphical Display Of The Model Listed in Section “3.8 A Working Example”
[Link to open this graphic in a separate page]

With the namespace declarations and the xml:base attribute, the rdf:RDF element looks like this:

<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns:owl="http://www.w3.org/2002/07/owl#"
   xml:base="http://example.org/arch/"
   xmlns="http://example.org/arch/" 
>

The additions mentioned above have been highlighted on boldface. Only the last two were created for this particular RDF fragment (the owl and rdfs namespaces being standards). After making the additions to the rdf:RDF opening tag, the structured text fragment from Section “3.8 A Working Example” was converted to RDF-XML and pasted into the text box on the validator page. The validator reported a successful validation, and produced the graph shown (except larger, with readable text).

3.11 Using Modules

When a model becomes large, or has many different kinds of things that one might like to keep separate, the XML machinery that comes with the use of RDF-XML provides ways to split the data into modules. The main RDF document can be arranged to be a shell that declares namespaces and imports the parts. This section covers the method used by the author.

To import a fragment of RDF-XML into the dataset, declare it as an external entity. This is done by inserting an ENTITY declaration in the DTD of the main RDF-XML document. For example:

<!ENTITY orgs SYSTEM 'organizations.ent'>
This declaration defines an entity named systems, and equates it with the document systems.ent. To have the processor place the contents of systems.ent into the dataset, simply refer to the corresponding entity in the body of the main RDF document, like this:
<!DOCTYPE rdf:RDF [
   <!ENTITY orgs SYSTEM 'organizations.ent'>
]>

<rdf:RDF
   ....
>

%orgs;
...
</rdf:RDF>
Even the DTD can be placed in its own document:
<!DOCTYPE rdf:RDF SYSTEM "arch_rdf.dtd">
<rdf:RDF
   ....
>

%orgs;
...
</rdf:RDF>
The external DTD (arch_rdf.dtd) contains, in this simple example, just the external entity declaration:
<?xml version='1.0' encoding='utf-8'?>
<!ENTITY orgs SYSTEM 'organizations.ent'>
The module organizations.ent would contain, for example, the data listed in Section “3.8 A Working Example”. The module can use entities and prefixes that are defined in the DTD or in the rdf:RDF element in the main document. In a dataset split up into many modules, the DTD would contain one external entity declaration, and one import statement, for each module.

In practice, one might use a module for the ontology (if any), and one for instance data for each of the top-level classes. Remember that RDF statements in any module can refer to a resource from any other module, simply by using its RDF identifier. And if the resource has not yet been defined in any other module, an RDF processor will simply create a entry in the dataset at the time it encounters the reference.

3.12 Example Of A Complete Set of Files and Modules

In this section, by way of illustration, we set forth the complete set of modules for the example used in this paper. The complete content of each of the files is available from the author's web site at www.tompassin.net/pub/extreme2007/. Example XSLT transforms are can also be found at the same location. The basic files are the following:

arch_rdf.dtd                    The DTD. Defines entities.
arch.rdf                        Skeleton of the RDF file.  Imports the modules.
basemodel_ontology_ent.txt      Basic class and property definitions.
orgs_ent.txt                    Structured text module for organization data.
persons-ent.txt                 Structured text module for data about persons.
systems_ent.txt                 Structured text module for data about systems.

The DTD declares an entity for each namespace, and an external entity for each module. Here is the DTD:

<?xml version='1.0' encoding='utf-8'?>
<!--================ URI shortcuts =================-->
<!ENTITY model-orgs "http://tpassin.net/architecture/2007-04-10/orgs/">
<!ENTITY model-systems "http://tpassin.net/architecture/2007-04-10/systems/">
<!ENTITY model-person "http://tpassin.net/architecture/2007-04-10/person/">

<!--============= Entity for the base namespace ======-->
<!ENTITY arch-ns "http://tpassin.net/architecture/2007-04-10/">

<!--=========== External entities containing the data ===============-->
<!ENTITY base-ontology SYSTEM 'basemodel_ontology.ent'>
<!ENTITY systems SYSTEM 'systems.ent'>
<!ENTITY persons SYSTEM 'persons.ent'>
<!ENTITY orgs SYSTEM 'orgs.ent'>

The DTD contains three entities labeled "URI shortcuts. They are not actually used in this example data set, for simplicity. However, for a larger and more complex data set, one would probably want to use different namespaces for different kinds of data. These entities make it easy to write the URIs, which would otherwise be long, tedious, and hard to remember. The entity names should be made mnemonic. In practice, these entities would be used in attribute values, like this:

  <Org rdf:resource='&model-orgs;TPPA'/>
In structured text form, this would be written as

Org::
   rdf:resource::&model-orgs;TPPA

Note that the file names for the entities representing the modules, like systems.ent, are different from their counterparts listed above, such as systems_ent.txt. This is because the latter, which is a structured text file, needs to be converted to an RDF-XML file by running the converter utility. Here, systems.ent is the corresponding RDF-XML file.

The .rdf skeleton file defines all namespaces and imports the data modules. It need contain nothing more.In this example, it is written as an XML file, but it could just as well have been written as structured text. Make sure that the namespace URIs agree exactly with the corresponding entity definitions in the DTD.

<?xml version='1.0'?>
<!DOCTYPE rdf:RDF  SYSTEM "arch_rdf.dtd">
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns='&arch-ns;'
xml:base='&arch-ns;'
>

&base-ontology;
&systems;
&persons;
&orgs;
</rdf:RDF>

Although it is possible to produce a single RDF-XML file that contains all the data, this would normally not be necessary, since any XML or RDF-XML processor will be able to process the data in this modular format. However, it is necessary to produce a single file for submission to the W3C RDF validator. The author uses the RXP XML parser for this purpose.

Processing the files To produce usable output, once the skeleton RDF file has been created, and the structured text files populated with some data, it is only necessary to run the converter utility to turn the structured text into actual RDF-XML. They can all be processed at once by means of a batch file, or they can be manually converted one at a time. Either way, the process is quick. Then output can be produced by running an XSLT transform. The author normally does this using the SAXON XSLT processor. Of course, other kinds of programs could be written to process the data, but so far, XSLT has proven satisfactory.

4.0 How RDF Adjusts To Evolving Models

In the early phases, the model usually changes frequently. When attempts are made to test the model by populating it with samples of actual instance, one invariably needs to add properties and to restructure part of the model. RDF adjusts easily to such changes.

4.1 Typical Kinds Of Evolutionary Changes

The most common changes are name changes, splitting a property into structured parts (e.g., telephone numbers or addresses), reclassifying an individual or subclass, creating or changing parent classes, and sometimes, accidentally creating duplicate instances.

4.2 Renaming

A name, that is, a readable label, is usually defined by an rdfs:label attribute. It should be declared in just one place, so the name will be simple to change. A change could be done by a search-and-replace in a text editor.

There is no restriction to just one rdfs:label. Although most processing software won't be sure what to do with the extra labels, a standard RDF processor will still build the model correctly.

4.3 Adding And Splitting Structure

Probably the commonest, and the most important kind of change, after simply adding new properties, is the structuring of previously simple properties. For example, the model might start out with an address as a literal string value, but later we might need to structure it into components. Or the model starts out providing for a current-year budget value, but later we need to turn that into a multi-year funding profile.

By using the structured text format, and using rdf:parseType::Resource as illustrated earlier, making these changes is extremely simple. If the same structural changes need to be made to many different instances, they could be applied using either regular expressions to the structured text file, or by applying an XSLT transform to the RDF-XML version.

In the latter case, there would need to be a program that could change the RDF-XML form back to structured text. The author has not yet needed to write one, but it should not be hard to implement (and it would only need to be done once). Typically, though, the need for this kind of change shows up before there are many instances in the data set, and at that point the changes can easily be made by hand.

4.4 Reclassifying

Sometimes an class needs to be reparented, or to be declared to be a subclass of a new class. Again, this will be done in one place, by direct manual typing or by a search-and-replace operation. The same applies to properties in the event that a taxonomy of property types becomes needed. These changes can be done even when the new class is in a different module. The corresponding operation can be more difficult with UML or EA tools.

4.5 Accidental Duplication

If one accidentally inserts an instance more than once, then either the duplicates will use the same identifier or they will use a different one. If the former, no real harm is done, although some of the resource's properties might appear twice. Some programs might display the duplicates, and some might not. The condition can be corrected when convenient.

If the duplicated instances use different identifiers, one could use a global search-and-replace to change all references to one of them. Alternatively, one could use the OWL property owl:sameAs to declare that the two instances are really the same. An OWL-aware processor would understand, and similar logic could be written into any custom programs or XSLT transformations used in processing the data.

5.0 Case Study - Populating And Evolving The Model

The model fragment depicted in Figure 1 evolved over time, in the manner discussed above in Section “4.0 How RDF Adjusts To Evolving Models”. In this section, a few of the changes are illustrated as a kind of case study. Only a small portion of the changes can be shown, to keep the material short. Identifying details have been changed, but the actual model did go through the evolution discussed. Only the relevant parts of the model are shown, for simplicity.

5.1 Example: Funding Profiles

In populating the Observing System part of the model, we started with the following data:

ObservingSystem:
   rdf:about::central-fargo-obs-sys
   rdfs:label::Central and Fargo Observing System
   acronym::CFOS
   funding:500k

We soon realised that it was important to capture funding as a multi-year profile, since this model would be used to help with long-term planning. So funding was changed from a simple property to a structured one, with each entry having an amount and a fiscal year. We suspected that the funding profile might become even more complex as we learned more, so we changed the model yet again:

ObservingSystem::
   rdf:about::central-fargo-obs-sys
   rdfs:label::Central and Fargo Observing System
   acronym::CFOS
   funding-profile::
      funding-fy-profile
         date-fy::2006
         budget-amount:500k
      funding-fy-profile
         date-fy::2005
         budget-amount:300k

Next, we realized that we needed to indicate the sources of funding:

ObservingSystem::
   rdf:about::central-fargo-obs-sys
   rdfs:label::Central and Fargo Observing System
   acronym::CFOS
   funding-profile::
      funding-fy-profile
         date-fy::2006
         budget-amount::500k
         funding-org::
             rdf:resource::big-govt-agency
      funding-fy-profile
         date-fy::2005
         budget-amount::300k

Probably in the future we would have to further detail the budget amounts by funding source, but at the time, the data was not available. Note how the funding organization is referenced. The agency in question was already in the dataset, in a different module. Otherwise, we would have added it, although that could have come later.

Next we were informed that the client insisted that all funding be classified as operational or new acquisition. Here is the model after making the changes:

ObservingSystem::
   rdf:about::central-fargo-obs-sys
   rdfs:label::Central and Fargo Observing System
   acronym::CFOS
   funding-profile::
      funding-fy-profile
         date-fy::2006
         budget-amount::
            operational:400k
            new-acquisition::100k
         funding-org::
            rdf:resource::big-govt-agency
      funding-fy-profile
         date-fy::2005
         budget-amount::
            operational::250k
            new-acquisition::150k

5.2 Discussion

The evolution of the model shown above is very normal. Changes had to be made as the data was collected, to accommodate the actual nature of the data. More changes had to be made because of preferences of the client. Note that, although new properties were indeed added, the structure had to change as well. The changes were very easy to make, simply by typing them in almost exactly the same way as if the dataset were an ordinary text document, say, part of a report.

In the actual project, the data was written by hand as RDF-XML, structured as set forth in this paper. Were the work to be repeated, the author would use the structured text format instead. At each step of this evolution, the dataset was cast into an easy-to-read display by applying an XSLT transformation, as discussed in the next section.

6.0 Transforms and Templates For Presenting The Model

When a model is small, the readability of the structured text format is so good that nothing more may be needed. Eventually the model becomes too large to be absorbed in this way. Even after it has been split into modules, more kinds of displays and presentations become necessary. Also, the data needs to be shown in different ways for different purposes.

6.1 Data Presentation - Essential For Communication

A good presentation is essential for communicating about the model. It is also useful in checking the structure and details. Variations on indented listings are often the best way to present this kind of data, when the focus is on the detail of the instances. Graphical presentations can be good for high-level views of the design.

Specific views of the data will include a class hierarchy, a listing of top-level classes, a list of all properties used by each class (without duplication), a list of instance data (just the entities without detail), and a complete display of all the data. Data templates are special views that display the unpopulated data structures, so that instance data may be collected.

6.2 XSLT For Extracting and Formatting

XSLT is a good tool for creating views of the data because it operates on the RDF-XML form of the dataset, which is generated by running the translator on the structured text dataset. XSLT is feasible to use because the structure of the RDF used here is so restricted and regular. XSLT can be used to create XML, HTML, SVG, text, and other output formats.

An XSLT transformation can be somewhat challenging to develop, but once available, it can be easily run from the command line, or by means of a batch file. The principal challenges in developing an XSLT transformation are 1) to operate on the potentially recursive data structures, 2) to obtain labels for the entities to insert where needed, and 3) to create lists of unique elements or properties for use in, for example, property listings and data templates. There are standard ways to do all these things in XSLT.

To create a view of only part of the data, for example, only the Systems, it is possible to build the RDF-XML file using only one or two of the modules. For example, one could use just the class hierarchy and the Systems modules (the class module would supply readable names for the entities in the Systems module).

6.3 Data Collection Templates

In a large project, in may be necessary to create a number of data collection templates. The templates will get filled out by relatively unskilled staff who may collect data from telephone interviews or from web sites. Although a web form might be created for this purpose, in many projects the templates will have to be paper or word processing documents. There are several requirements for such templates.

First, they must have a field for each item in the model, without duplication. An XSLT transformation can produce such a template. Second, it must be as easy as possible to transfer the data into the actual dataset. The best way to do this is to structure the appearance of the template so that it looks the same as the structured text format. This minimizes the difficulty of transcription. It will also be easy to produce from an on-line data collection, should the data be collected using web forms.

Third, there must be provisions for adding new kinds of data not presently in the model. This last is in case the data collector discovers new but apparently relevant data. By providing extra lines in the template, and instructing the data collector to write the newly discovered data in the same way as all the rest, the new data will be already formatted for addition to the dataset with minimal effort.

For all these reasons, the structured text/RDF approach supports the creation and use of data collection templates very well.

6.4 Examples

All these examples were created using specific XSLT transformations that created HTML pages.

Figure 4: Class Hierarchy Display
[Link to open this graphic in a separate page]

In this case, all the classes have been given readable names using rdfs:label. If they had not, the XSLT transform would have used the class URI identifiers instead.

Figure 5: Property Listing Display
[Link to open this graphic in a separate page]

The transform finds the properties for each class by inspecting all the instances of the class in the data set. Therefore, if another property is added at some point, it will be picked up in this listing. If no owl:ObjectProperty has been declared for this property (and so no readable label has been declared), the transform will use the identifier in the instance data for the label.

Figure 6: Data Collection Template
[Link to open this graphic in a separate page]

This is one possible template. It would also be possible to use the class properties listing as a template.

Finally, here is a complete listing of the dataset.

Figure 7: Listing Of The Entire Dataset
[Link to open this graphic in a separate page]

Note that some of the properties have got ordinary names, while many more display the identifying URI instead. This is because the dataset does not include declarations for some of the properties. In those cases, the transform uses the URI instead, which it gets from the rdf:resource attribute. Examples are member of, which is a declared label of the member-of property, and related-org, which is the relative URI of the corresponding property.

An RDF processor will do the right thing with this dataset, even though there are no declarations for some of the properties. As has been said earlier, the processor will go ahead and add the properties to the dataset anyway, because they have been referred to by an identifier. This characteristic of RDF is one of the reasons it is well suited for this kind of modeling. The model will be correct and usable, with the desired structure, even if incomplete in some details or if there is no ontology at all.

6.5 Graphical Displays

No graphical displays were created for this work, but it is possible to sketch out some possibilities. One difficulty in automatically generating graphics is in computing a good layout. The W3C RDF validator does a fair job, but its products are usually hard to read. Even the best database design programs usually need some hand adjustment after an automatic layout.

Perhaps the simplest way to get a graphical visualization would be to get one as a Scalable Vector Graphics (SVG) file from the W3C validator. The appearance could be modified by hand using a drawing tool that can import SVG.

An XSLT transform could be written to output the dataset in a format used by a program that does spring-loaded layout, like Touchgraph. Whether this kind of layout would be acceptable is questionable. A hyperbolic display generator would be another possibility.

If we accept that some hand layout will be necessary, the task becomes the generation of a file that can be imported by some drawing tool, with a sufficiently good initial layout that there is something to work with. Usually it is better not to try to include all the detail in such graphics. The file format could be SVG or some other well-known format.

Another possibility is to write a transform that outputs XMI, and import that into a UML tool. It might also be possible to write a transform to create a file that the Metis EA tool could import, though that would probably be difficult and take considerable specialized knowledge.

So there are many possibilities, but which ones will turn out to be the best remains to be seen.

7.0 Using The Data

At some point, it may become desirable to store the dataset in something other than the structured text file (or files, if it is split into modules).

7.1 After The Model Stabilizes

Once the model has gone though the initial evolutionary phases, it will begin to stabilize. When the state of the design has settled down, the model could be converted to a relational database schema. RDF lends itself to such a conversion, since the model will essentially be a highly normalized relational model already. See [Passin 2004] for more on the relationship between RDF and relational data models. One could speculate whether such a conversion could be done automatically, perhaps using an XSLT transform.

Other alternatives would be a UML tool, as mentioned above, or possibly a more specialized EA tool. Or, since the document format is simple and regular, it would probably be feasible to write a homegrown tool to help with navigating and editing the data, while keeping the data in one file as before.

7.2 Staying With RDF

If the decision is to stay with RDF, the RDF-XML version could be imported into one or another of the existing RDF databases, which could then feed any number of RDF applications, including ones written to support the goals of a particular project. In the long run, this may turn out to be the best approach. But until the tools have been created, it may be best simply to stick with the structured text format as long as possible.

8.0 Limitations and Alternatives

This section discusses some limitations of this approach, and covers some alternative methods and the reasons this method was chosen over them.

8.1 Limitations

One potential limitation is that it can be difficult to keep track of a large numbers of references to resources. For example, if one is collecting data on Organizations, one may end up needing to refer to dozens or hundreds of URIs that identify those organizations. Such a task is hard for most people. This problem is not unique to the current approach, of course. It can be managed to some extent by creating URIs that are mnemonic. Also, the URIs in question can be extracting by using an XSLT transformation, producing a sorted reference list. This is one area that would benefit from additional automatic assistance.

Another limitation, or at least an area that has not yet been worked out, is incorporating non-textual information, such as tables or images. Images could be handled by pointing to them with a URI. Tables, on the other hand, are complex structures, and it is not obvious how to incorporate them into the simple, hierarchical format espoused in this paper.

As the model evolves and gets more mature, it may be desirable to import it into an existing modeling system. There are no tools as yet to do this. Either the data would need to be transcribed manually, or some kind of transformation or program would need to be written to accomplish the task. However, it would only have to be written once, since the same structure will be used for a variety of projects.

8.2 Alternative Approaches

Aside from large-scale modeling systems such as Rose, Metis, etc., and UML tools, all of which have been briefly discussed earlier, the most common alternatives for these kinds of modeling tasks are probably spreadsheets, word processing documents, and drawing programs such as Powerpoint, Visio, etc. The whole task could also be done with a custom XML language.

Spreadsheets do not readily capture hierarchical information. It can be done visually, but extracting the hierarchy requires careful scripting. Cross-reference are also hard to provide in a way that survives restructuring, and the identifiers are generally local to the spreadsheet. Showing different views of the same data is likely to be difficult. There is no apparent way to create or apply an ontology, to the extent that one is desired. Thus use of a spreadsheet does not seem attractive.

Word Processing Documents suffer from similar problems. One can use an outlining feature to capture the hierarchy, and the hierarchy is usually fairly easy to modify. Extracting the data so it can be used or viewed in other ways is likely to be harder than with a spreadsheet. Depending on the word processor, it may be possible to export as XML and write an XSLT transform to get the data into a useful format.

Drawing Programs are very useful for sketching a high-level starting point. They tend to get too cluttered and too hard to change as the model evolves and becomes populated. Getting the data out into other forms is likely to be extremely hard, if it is feasible at all. Cross references have to be denoted by drawing connecting lines, which are hard to translate into more durable references and identifiers.

Custom XML Formats would seem to be a logical choice, for those with the technical bent to develop them. They would work well, and could be used with the structured text presented in this paper. However, why invent another format when a standard one already exists, one that is very suitable for the purpose? Furthermore, the XML formatted data would have to be transformed into other forms anyway at some point. For example, it one decided to import the data into an RDF system, some means would have to be devised for making the appropriate transformation. Also, it would be harder to get the custom XML format to work easily with OWL ontologies, although it would probably be fairly easy to extract ontology information from the custom format into OWL. Because they understand the semantics and implied relationships in the RDF data, RDF processors can enrich the data set. With a custom XML format, a custom processor would have to be written to achieve these advantages.

All things considered, a custom XML format would be quite feasible and would have some of the advantages of the system presented in this paper. However, it would be non-standard, and all the semantics would have to be carefully worked out. The use of RDF, in contrast, starts with a well-established standard which in turn has well-defined semantics. To sum up, if you want to use an XML format, why not use the standard RDF-XML instead of a custom one? This paper has demonstrated that the RDF-XML format is extremely well suited for certain kinds of modeling and information-collection tasks, and can be easy to use as well. Why not make use of those strengths?

9.0 Conclusions

This paper has demonstrated that RDF is very well suited for supporting a wide range of modeling tasks in the early stages when the model is fluid and rapidly changing. A very restricted subset of RDF, is used, together with a very simple structured text format for it. This combination was designed to mimic as closely as possible the natural tendency for people to write structured information as indented hierarchical lists, even when the data is not truly a hierarchy.

The author has found from personal experience that using RDF in this way is extremely helpful when the model is changing and growing every time new data is found to populate it, and when the data structures must be handed off to others to populate with instance data (e.g., by creating data collection templates). As an experiment, he used this method to track and manage the review comments for this paper. With the help of a few tweaked XSLT transforms, this approach was very useful.

Contrary to common belief, it turns out the RDF can be written and maintained by hand with relative ease, at least up to some scaling limit that has yet to be determined. Beyond that, there are many feasible migration strategies to shift the data into other data stores that may be better at storing or manipulating large quantities of data.

The author hopes that this account stimulates a wider use of RDF for practical, real-world projects.


Bibliography

[OWL] Web Ontology Language OWL, http://www.w3.org/2004/OWL/

[Passin 2004] Passin, Thomas B., Explorer's Guide To The Semantic Web, Manning Books, 2004.

[RDFS] RDF Vocabulary Description Language 1.0: RDF Schema, http://www.w3.org/TR/rdf-schema/

[W3C RDF Primer 2004] RDF Primer, The World Wide Web Consortium, http://www.w3.org/TR/REC-rdf-syntax/



Easy RDF For Real-Life System Modeling

Thomas B. Passin [Principal Systems Engineer, Noblis]
pub1@tompassin.net