Versioned Hyperdocuments: Abstract Model

John D. Heintz

Abstract

Hyperdocuments present special requirements for integration with lifecycle and versioning systems. These requirements can best be summarized as "versioned linking", something not addressed by merely versioning the storage objects involved. The overriding challenge in versioned linking is resolving the correct version at the end of a link -- where "correct" is defined by arbitrary business policy. This challenge can be addressed by 1) being able to precisely model a versioned hyperdocument system and 2) having rich enough versioning semantics to support the desired business policy. SnapCM is a versioning model that provides clear version and version-aware reference semantics that can be used to satisfy both of these needs.

This paper presents three models that build on each other. The first is a partial object model for hyperdocuments. This first model isn't a complete hyperdocument model, but rather contains only the types and specification necessary to support the two models that follow. The second model defines a versioned hyperdocument model by extending both the first model and SnapCM. The third model extends the versioned hyperdocument model to include specifications for RTDs (Referent Tracking Documents).

These models enable us to satisfy both of the versioned linking challenges. They provide well-defined support for hyperdocument lifecycle systems by describing a coherent object model to describe both the hyperdocument and SnapCM versioning domains. These models also provide rich business policy support based on SnapCM's ResolutionPolicy semantics.

Keywords: Content Management; Modeling

John D. Heintz

John D. Heintz is a Senior Consultant as ISOGEN International, LLC. He has over seven years of experience in software development and formal modeling. The last two years John has focussed on the versioning and configuration management of hyperdocument systems and there integration with other repositories. When John isn't wresting this abstract ideas he is a loving husband, a proud father of a one year old son, and a dog owner.

Versioned Hyperdocuments

Abstract Model

John D. Heintz [ISOGEN International, LLC]

Extreme Markup Languages 2002® (Montréal, Québec)

Copyright © 2002 John D. Heintz. Reproduced with permission.

Introduction and Goals

Information management systems support a variety of lifecycle capabilities: from no lifecycle support (static content), to controlled modification of content, to controlled concurrent modification and historical access of content. This paper is targeted at systems that must manage the controlled concurrent modification or historical access of highly linked content. Such systems often need very specific control over the temporal and parallel characteristics of both the content and linking relationships between the content.

For the purposes of this paper, a document is defined as a container of information or content1. A hyperdocument is a special document that additionally specifies linking information with or about other documents. Finally, lifecycle is defined as a function of the business process and versioning capabilities in a given system, where business process is all those business policy rules that govern the creation and modification of information.

It is important to understand that the business process can't in general be independent of the versioning capabilities, but instead must often be defined in terms of those capabilities2. This critical detail led to the development of [SnapCM], a versioning model with explicit support for temporal and branched content with versioned linking resolution.

Versioning is an essential part of the solution to some business process challenges. Back-tracking of content in a "workflow" to previous steps in the route (as a result of rejection) requires access to past versions. Parallel development and later merging of the same content resource requires managing variants of that resource. Access to historical content for processing obviously requires sufficient version history management. These examples only indicate that documents must be versioned to satisfy the business process requirements. However, when documents are involved in hyperlinked systems the business process challenge is compounded: versioned links must resolve to the correct version given the context within which the link is being resolved. Obviously, it is the responsibility of the business process to define "correct" in this context.

As an example of what can fail in a hyperdocument lifecycle system consider the following HTML fragment in the context of a versioned web server:

Figure 1: Example HTML fragment.
<html>
  ...
  <p>
    Adjust part X to align with part Y as described in
    <a href="./g348.html#a45"/>
  </p>
    ...
</html>

The hyperlink specified in Figure 1 can fail in several ways. The most commonly known failure would be "404 File Not Found". This obvious and abrupt link resolution failure is probably the safest too: the user knows immediately and without question that the link resolution failed. Another failure mode would be resolving to the wrong version of the "g348.html" document. Finally, while a correct version of the "g348.html" document may get resolved, the content identified by "a45" may have changed during its evolution and may no longer match the intended target of the hyperlink. These last two failure modes are much more dangerous -- the user doesn't receive immediate feedback that link resolution has failed and may proceed with incorrect information. If the preceding HTML file was part of a mission-critical maintenance manual, the consequences of a versioned link resolution failure could be quite serious.

Any system that manages hyperdocuments needs to address the issues caused by the lifecycle changes of linked documents. The example HTML in Figure 1, while trivially simple, can still exhibit all three versioned link resolution failures. Defining explicit modeling support for versioned linking is a necessary first step in designing business processes and supporting systems to address these problems.

About this paper

Versioned linking is fundamentally a problem of context (time, branch, structure, business domain) and URI resolution. This problem is best shown with simple examples, therefore HTML (and [XInclude] for the RTD [Referrent Tracking Document][RTD] section) is used in place of the more powerful and complex [HyTime] and [XLink] standards.

The designs specified in this document are defined using the UML [Unified Modeling Language][UML], and particularly using OCL [Object Constraint Language]. The OCL specifications, while provided inline, are also informally described in the text so readers can safely skip over the OCL invariants without losing information.

The models in this paper use several UML diagram and figure types. The types of diagrams found in this paper are the Class diagram, the Instance diagram, and a modified (grid shaped) Instance diagram optimized for presentation of Versions on Snapshots. When referring to types from the models an initial capital letter is used such as "Document".

The Class diagram is used to describe types (or classifications) of objects. An example diagram will help show the syntax:

[Link to open this graphic in a separate page]
. The Generalization relationship is shown between a "Supertype" and a "Subtype". The Association relationship is used to specify properties and navigation. Sometimes an Association has a black diamond to represent a composite relationship from a type to a "Contained" type that signifies ownership/containment. Finally, Associations can have a qualifier type at one end that specifies that "Sometype" is qualified by "QualifierType" (i.e. associative array or dictionary).

The Instance diagram is used to show example objects from the Class diagrams. Instances have the syntax "name:Type", with ":Type" representing an anonymous instance.

Finally, the Grid Instance diagrams are like Instance diagrams except they imply additional instance relationships based on vertical alignment with a Snapshot instance and box type. Version instances lined up vertically on a Snapshot instance are all in the Snapshot's effectiveVersions property set. Finally, Version instances that have solid box lines (instead of dotted) are in the Snapshot's createdVersions property.

Example Versioned Hyperdocument System: Legislative Documents

A more complete example can illustrate the problems and expected behavior of a hyperdocument system's versioning functionality. Consider a two-chamber legislature that uses a markup based authoring system to single source content. In this legislature, bills (legislation) are

  1. introduced in one chamber
  2. revised and re-read in the same chamber any number of times
  3. sent to joint chamber committee where multiple variants of the bill vie for support
  4. exit from committee as approved legislation
What would such a bill look like? Something perhaps as simple as:

Figure 2: bill123.html -- Bill 123 Hub Document. [Version 1]
<html>
  <head>
    <title>Bill 123</title>
  </head>
  <body>
    ...
    <a href="uri::repository:section1.html"/>
    ...
  </body>
</html>

In this example the sections contain the actual content of the bill and are linked to by a master (hub) document. The custom URI indicates repository hyperlink target resolution. [While a true production system would use some transclusion mechanism ([XInclude] or [HyTime] value-ref), the versioned link resolutions are no different from HTML -- and HTML anchor 'a' tags are simply easier to describe a problem with.]

Assume the content of the first section in the repository to be:

Figure 3: section1.html -- Section 1 of Bill 123. [Version 1]
<html>
  <head>
    <title>Section 1 of Bill 123</title>
  </head>
  <body>
    <p>
      Hereafter the sales tax will be 5%.
    </p>
  </body>
</html>

In January a legislator presented "Bill 123" on the floor (of the first chamber). This is the "logical" version of the bill, composed of the first versions of both the hub and section documents. This logical version is identified as the "introduced" version.

In February the same legislator creates a new version of "Section 1" replacing "5%" with "6%". This new logical version of the bill is again presented on the floor as the "second reading". Notice that only a new version of the section was created, not a new version of the bill hub document.

This introduces the first challenge. Assume that a legal researcher needs access to the both the "introduced" and "second reading" versions to compare/contrast them. The repository must provide the capability to resolve links from the hub bill document relative to the time "January" in order to support this requirement.

In March the "second reading" logical version of the bill is moved to joint committee. In committee two different legislators create competing variations of the bill. The first legislator modified the content of "Section 1" by replacing "6%" with "3%" and the second replaces "6%" with "9%".

Another challenge: The authoring staff for each legislator needs to see and author against the correct version of the section. When the first legislator's authoring staff views the "Bill 123" hub document they need the hyperlink to resolve to the version of "Section 1" with the content, "3%" -- even if the other variation of "Section 1" has the most recent creation timestamp. Notice that there still only exists a single version of the hub document "Bill 123".

In April both of these legislators make further revisions: the first to "5%" and the second to "7%".

In May the two variants of "Section 1" are merged and the final logical version of the bill comes out of committee with "6%" in the content of "Section 1".

The legislative researchers' challenge still exists but is complicated by variations of the section. Researchers need to be able to navigate the hyperlink from the single version of the hub document to seven different versions of the section document, all depending on the context of their research.

This example presents two contexts that affect the resolution of links: time and branch. This paper also identifies a context of structure (which hyperlink is being resolved) and a business domain context (anything from the business model, such as authentication/authorization). The first two contexts (time and branch) are fundamental to versioned link resolution, while the latter contexts are both optional and specific to each system.

Applicablity to non-markup systems

While the examples in this paper use markup languages, and the discussion revolves around hyperdocuments, the issues are not limited to markup systems alone. By choosing definitions that are correct for hyperdocument markup systems but still sufficiently general, the models in this paper can provide support for the design and implementation of lifecycle issues in any information management system. To support this goal the definitions of document, hyperdocument, and lifecycle are intentionally abstract -- indeed they say little more than "information", "associated information", and the "rules of modifying information and recording those modifications."

With these abstract definitions database systems such as RDBMS [Relational Database Management System] can be modeled as hyperdocument systems -- with all of the associated lifecycle problems and solutions. Each row in a relational database can represent the content of a single document and each foreign key in that row can represent a unidirectional hyperlink to another document. This same logic can be extended to other types of database such as LDAP [Lightweight Directory Access Protocol] by arbitrarily deciding what constitutes a "document" and a "link" in the database.

Hyperdocument Model

A Document is a container of information or content. A Hyperdocument is a type of Document that additionally specifies arbitrary linking information with or about other Documents.

The application of these definitions is straightforward to markup system like HTML, [XML], and SGML. For example, in the case of [XML], "content" can refer to the nodes in the DOM tree for a given Document.

A Hyperdocument is an information object that specifies arbitrary linking information with or about other information objects. Hyperdocuments enable the creation of useful and rich information management systems. Examples include: single-source authoring systems, annotation systems, parts-database systems, and Topic Map[TM] systems.

Hyperdocuments are typically known from the HTML, [XLink], and [HyTime] specifications. These specifications serve the purpose of providing standard syntax and semantics for interchange and processing of statically linked information very well. However, these standards don't directly lend themselves to the discussion and modeling of lifecycle problems.

The remainder of this section defines an object model for hyperdocuments in order to provide a concise vocabulary for discussing general versioned hyperdocument solutions separate from the syntax of a particular markup language.

Hyperdocuments

Figure 4: Hyperdocument Model
[Link to open this graphic in a separate page]

Figure 4 defines both Documents and Hyperdocuments. Documents are defined in terms of their content -- a composed set of Content objects.3 The Document.content property is the set of all Content objects owned by the Document. Content itself is left unspecified -- this minimal amount of information is sufficient for the needs of this paper.

Hyperdocument is a subtype of Document that additionally contains a set of Hyperlinks.

Hyperlinks

Figure 5 expands on the Hyperlink type introduced in Figure 4. Hyperlinks contains Anchors, which are the "ends" of a Hyperlink. The Anchor.content property is the list of Content objects that are "anchored" by the end of a Hyperlink.

Figure 5: Hyperlinks Model
[Link to open this graphic in a separate page]

Anchors and Addresses

Figure 6 expands on the Anchor type further and introduces the Address and Location types. An Anchor contains a list of Addresses. Each Address is associated to a Document via a Location.

Figure 6: Anchor/Address Model
[Link to open this graphic in a separate page]

The Address.content property identifies a list of Content objects in the content of the Address' associated Document. In OCL:

context Address
inv limit_content :
self.location.document.content->includesAll(self.content)

The anchoredContent association is shown again here for clarity, and with this figure's information the value of Anchor.content can be defined to be the sum of all its Anchor.addresses.content properties. In OCL:

context Anchor
inv define_content :
self.addresses->iterate(address:Address, accumulator:Sequence = Sequence{} |
  accumulator->union(address.content))
and result = accumulator

Hyperdocument Locations

Figure 7 expands on the Location type and introduces two subtypes of Location: ThisLocation and DefinedLocation. Locations are contained by Hyperdocuments. The value of the Location.document property, shown again for clarity, is not specified in the Location type, but rather in its two subtypes.

Figure 7: Hyperdocument Locations Model
[Link to open this graphic in a separate page]

ThisLocation always refers to the Document (actually Hyperdocument) that contains it. In OCL:

context ThisLocation
inv define_document :
self.document = self.hyperdocument

DefinedLocation is left unspecified until section “Versioned Hyperdocuments” where a specification based on SnapCM versioning objects is provided.

ValueRef

Figure 8 shows a subtype of Hyperlink named ValueRefHyperlink, two subtypes of Anchor to support ValueRefHyperlink, and the introduction of the effectiveContent association to enable the specification of "value referencing". A ValueRefHyperlink semantically means that the content of the ValueAnchor replaces the content of the RefAnchor. This models traditional transclusion.

Figure 8: ValueRef Model
[Link to open this graphic in a separate page]

While the documentContent association, shown again for clarity, is intrinsic to the Document, the effectiveContent association describes a derived set of content starting with the Document.content value and replacing based on ValueRefHyperlinks. In OCL:

context ValueRefHyperlink
inv define_Document_effectiveContent :
self.ref.addresses->forAll(address |
  address.location.document.effectiveContent = 
   (address.location.document.content - address.content)->union(self.value.content))

Summary

The Hyperdocument Object Model introduces and specifies Documents and Hyperdocuments. Documents are containers of information or Content and Hyperdocument add Hyperlinks that can anchor Content. Hyperlinks anchor Content through Anchors that use Location and Address objects to identify the anchored Content. Additionally, a flexible transclusion model is defined based on effective Content and a specialized Hyperlink type.

Versioned Hyperdocument Model

Hyperdocuments bring to the forefront the versioned linking challenge: how does the system ensure the members of an anchor resolve to the correct version? The answer to this question depends on the context in which it is asked. The design of a Hyperdocument system includes business policies to answer this question based on the following characteristics of context:

Context of Hyperlink

Context of Hyperlink establishes the context of link resolution: the specific instances of Location, Address, Anchor, Hyperlink, and Hyperdocument that are currently being resolved.

The current Address being resolved in the context of a specific Anchor, Hyperlink, and Hyperdocument.

Context of Business Domain

This includes any information specified in the business domain model. Important examples include: currently authenticated user and workflow state.

Context of Time

The point in time, either present or historical, relative to the versioned link are resolved.

Context of Branch

Context of Branch is best illustrated with examples: draft vs. published, release one vs. release two, or customer A vs. customer B. SnapCM represents this context as a Branch object.

The following model provides support for the contexts of Hyperlink, Time, and Branch. The context of Domain is a characteristic of each individual system. The business policies for a hyperdocument system are free to use any or all of these contexts to specify which versions a link resolves to within the constraints of the SnapCM model.

This paper and model choose to manage a Hyperdocument as a versioned unit of work with version-aware DefinedLocation support. This choice is the second option of four possible levels of hyperdocument versioning granularity. The four levels are:

Document Versioning

Document Versioning treats the entire Hyperdocument as a versioned resource without making any provisions for version-aware Hyperlinking. This is what most content versioning systems implement.

Hyperlink Versioning

Hyperlink Versioning treats the entire Hyperdocument as a versioned resource with version-aware DefinedLocations.

Sub-Document Versioning

Sub-Document Versioning treats some defined or arbitrary subsets of the Hyperdocument's contained objects as versioned entities. This approach is similar to the previous approach, but disjoint sets of Hyperdocument objects (Content and Hyperlinks) are treated as versioned entities instead of the entire Document. For example, an [XML] structured document could have DOM node subtrees individually versioned, but one DOM node could only exist in a single version at a time.

Content Versioning

All objects contained in the Hyperdocument (Hyperlinks, Anchors, Address, Locations, and Content instances) are treated as individual versioned units.

Document Versioning by definition does nothing to address versioned linking and is therefore eliminated. The Content Versioning approach was eliminated because it leads to excessive resource utilization and scalability issues in a versioning system. The Sub-Document Versioning approach is not currently described, but will be the subject of future investigation.

Versioned Hyperdocuments

NOTE:

Types from the SnapCM model are prefixed with the "snapCM::" UML package namespace.

Figure 9 shows the complete integration of the Hyperdocument Model with the SnapCM Model. This model connects Hyperdocuments with SnapCM in only two ways: Document is a subtype of snapCM::Version, and DefinedLocation is associated to a Document via a snapCM::Reference.

Figure 9: Versioned Hyperdocuments Model
[Link to open this graphic in a separate page]

Making Document a subtype of Version implies that a Document is a versioned resource and is therefore the unit of granularity in a versioned system.4 Giving DefinedLocation an association to snapCM::Reference makes the resolution of the Location.document property version-aware. The qualified referenceTarget association is resolved relative to a Snapshot. The resulting model means that any Hyperlink Address that isn't located in the current Document is resolved via a snapCM::Reference to the correct Document (Version).5

The snapCM::Reference uses a snapCM::ResolutionPolicy object to determine the correct Document. This is the focal point for business logic to specify how the correct Document version is chosen. System design can result in arbitrary subtypes of snapCM::ResolutionPolicy to satisfy business requirements. Examples include: a fixed version, the effective version on the resolved snapCM::Snapshot, the current version on the snapCM::Branch, the version with workflow state "xyz", and so on.

The DefinedLocation.document property can be defined as the Document that is found through resolving the snapCM::Reference. In OCL:

context DefinedLocation
inv define_document :
self.document = self.reference.target(snapshot)
-- where 'snapshot' is the qualified snapCM::Snapshot

This invariant relies on the specification for snapCM::Reference.target, which in turn relies on the current snapCM::ResolutionPolicy. A pre-defined policy, snapCM::OnSnapshotResolutionPolicy, describes the default policy for snapCM::Reference resolution. This policy resolves to the Document that is effective on the target snapCM::Snapshot. This invariant, repeated from the SnapCM paper, in OCL:

context snapCM::OnSnapshotResolutionPolicy
inv targets_inv :
let snapshot = self.reference.referenceTargets.snapshot in
  self.reference.targets = 
    snapshot.effectiveVersion(self.reference.resource)

Versioned Hyperdocuments Example

The following example is similar to those in the SnapCM paper. Hyperdocuments behave the same way as other Versions in SnapCM. They are effective on some set of Snapshots, have next/previous versions, share snapCM::Resource across all snapCM::Versions, and use version-aware snapCM::References. The only significant difference is that Hyperdocuments associate to snapCM::References through Hyperlinks.

Before a reasonable example diagram can be shown however a diagramming shorthand is needed. Without some shorthand the diagram would be cumbersome due to the number of objects from both the Versioned Hyperdocument and SnapCM models. In Figure 10 instances of the Versioned Hyperdocument types and a SnapCM Reference object are shown on the left, while on the right all of the intermediate objects are replaced with a single stereotyped "<<hyperlink>>" association. This stereotype therefore represents all of the objects and association on the left, not a direct association from a Hyperdocument object to a Reference object.

Figure 10: Versioned Hyperdocuments Stereotype
[Link to open this graphic in a separate page]

The second shorthand is designed to simplify how SnapCM References and Policies are displayed. In Figure 11 instances of SnapCM Reference, Resource, Version, and ResolutionPolicy types are shown associated together on the left side. The right side introduces a stereotyped "<<onsnapshot>>" association between the Reference and Version instances; replacing several associations and the policy instance. Intuitively the vertical Snapshot instance resolves a target Version with the OnSnapshotResolutionPolicy (visually indicated by the vertical dashed line).

Figure 11: OnSnapshot Stereotype
[Link to open this graphic in a separate page]

Now a clear example of versioned Hyperdocuments can be shown. Figure 12 shows a versioned Hyperdocument and two other Documents that it links to. The effective versions, those that can be accessed by default, and the Reference target for each Snapshot are listed at the bottom of the diagram.

Figure 12: Versioned Hyperdocuments Example
[Link to open this graphic in a separate page]

The shorthand stereotypes are used to hide the details of hyperlink and resolution instances. Using the definitions of those stereotypes this diagram could be expanded to contain all of those hidden instances and associations.

Here is a detailed explanation of what is occurring on each Snapshot:

snap1:Snapshot

On snap1:Snapshot two Documents are created, r2doc1:Document and hdoc1:Hyperdocument. One Hyperlink is created that has a DefinedLocation (not shown, part of <<hyperlink>> stereotype) associated with ref1:Reference. This Reference is associated to r2:Resource with an OnSnapshotResolutionPolicy (not shown, part of the <<onsnapshot>> stereotype). Therefore, on snap1:Snapshot the hdoc1:Hyperdocument has a Hyperlink that anchors content in r2doc1:Document.

snap2:Snapshot

On snap2:Snapshot a new version of r2:Resource, r2doc2:Document, is created. This changes the effective Versions on snap2:Snapshot to include this new Version replacing r2doc1:Document. It also means that on snap2:Snapshot the hdoc1:Hyperdocument has a Hyperlink that anchors content in r2doc2:Document, not r2doc1:Document.6

snap3:Snapshot

On snap3:Snapshot r2doc1:Document is explicitly set as the effective Version. The hdoc1:Hyperdocument's Hyperlink again anchors content in r2doc1:Document (that is on both snap1:Snapshot and snap3:Snapshot, but not on snap2:Snapshot).

snap4:Snapshot

On snap4:Snapshot two actions are performed: the third Document and a new Version of r1:Resource are created. r3doc1:Document is effective in snap4:Snapshot as well as r2doc1:Document. The newly created Hyperdocument does not continue to use ref1:Reference and has no association to it. Instead, hdoc2:Hyperdocument has separate Hyperlink instances (from the <<hyperlink>> stereotype) and ref2:Reference. This new Reference has an <<onsnapshot>> association to r3doc1:Document. The result of these instances and associations is that the hdoc1:Hyperdocument anchors content in one of the r2:Resource Versions (depending on Snapshot), but its next Version hdoc2:Hyperdocument anchors content in a different Resource's Version, r3doc1:Document.

Summary

The Versioned Hyperdocument Object Model integrates the [SnapCM] model with the Hyperdocument Object Model by subtyping Document from snapCM::Version and associating DefinedLocation with a snapCM::Reference. The four contexts of versioned link resolution are specified as Hyperlink, Business Domain, Time, and Branch. Versioned link resolution may be specified in business policy by combining or specializing snapCM::Branches, snapCM::Syncs, and snapCM::ResolutionPolicies.

Referent Tracking Documents

The previous models enable the creation of business processes that unambiguously specify versioned link resolution semantics so that all links resolve to the correct version of their target resource. While this is a critical capability, there remains one possible failure mode for link resolution: the content addressed by the link is not the intended content (even in the correct version of the target document). This last problem is not something that can be completely addressed by models, business rules, and technology. Addressing the wrong content is actually an issue of semantics, and therefore a "people problem", because a computer cannot choose or interpret semantics, only the data. An example of this can be demonstrated in HTML. Assume that an existing HTML document looks like:

Figure 13: part-descriptions.html [Version 1]
<html>
  ...
  <a name="id234"/>
  <p>Part 45 is the circular widget.</p>
  ...
  <a name="id236"/>
  <p>Part 46 is the square widget.</p>
  ...
  <p>In conclusion we have described all parts.</p>
  ...
</html>

Assume also that another HTML documents refers to this one for the descriptions of parts:

Figure 14: part45-maintenance.html [Version 1]
<html>
  ...
  <p>
    ... identify <a href="part-descriptions.html#id234">Part 45</a>...
  </p>
    ...
</html>

Clearly the intent is that the HTML 'a' element named "id234" is to visually indicate the description of Part 45. This intent is difficult to capture in the markup however, and if an editor in the system reformats the part-descriptions.html file it would be easy to accidentally misplace a named 'a' element. For example, if the part descriptions were put into a table an accidental copy/paste error could leave behind one or more named 'a' elements:

Figure 15: part-descriptions.html [Version 2]
<html>
  ...
  <a name="id234"/>
  <a name="id236"/>
  ...
  <table>
    <tr>
      <td>Part 45</td><td>is a circular widget</td>
    </tr>
    <tr>
      <td>Part 46</td><td>is a square widget</td>
    </tr>
  </table>
  ...
</html>

When navigating from the part45-maintenance.html document to the part-descriptions.html, if the resolution policy results in the Version 2, a link resolution error occurs. In this case the user will not visually see the description of part 45.

What can the system do to prevent semantic resolution errors of this kind?7 The initial response to this question is: provide "where used" information and have editors check the content of documents that "use" what is being modified. This answer works in a large number of cases, but it suffers from some flaws:

  1. it assumes that the editor of a "used" document will understand the semantic intention of the "using" document. While HTML 'a' elements are typically clear enough, complicated [XLink] elements in arbitrary [XML] schema's may not be easy to understand at all.
  2. it assumes that editors have read access to all using documents to inspect them. Security may restrict access to large sets of documents preventing authors from performing this check..
  3. it assumes that editors can modify any using documents that need linking information updates. While HTML 'a' elements wouldn't require this, [XLink] can identify the targets of a link with an [XPointer] expression, and changes to the target document structure may require those [XPointer] expressions to be modified. Again, security considerations alone may prevent this.

In order to avoid the flaws of only providing authors with "where used" information, the system must provide a mechanism to buffer the actions of "used" document authors from "using" document authors. In short: the maintenance and change of one document must not cause a cascade of inspections and changes to other documents.

RTD [Referrent Tracking Document][RTD] provide exactly the solution needed. Instead of linking directly into a target document, an intermediate RTD is linked to instead. The RTD acts as a proxy resource for a virtual semantic resource implied by the first use of the content in the target document. Each RTD version is thus a proxy for each virtual content target version. Figure 16 shows hyperlinking with and without RTDs.

Figure 16: RTD Example
[Link to open this graphic in a separate page]

Given this indirection between using and used documents we can clearly identify the responsibilities of content authors: the responsibility of an author modifying document content is to guarantee that any existing RTDs still link to the current content after any modification, and the responsibility of an author linking into a document is to use an existing RTD or create a new RTD8 .

RTDs were originally introduced to solve an additional problem: to model document versioning using only hyperdocuments. Given the modeling support from [SnapCM] and the preceding models in this paper, this additional aspect of RTDs is considered less compelling.

Versioned RTD Model

Integrating RTDs into the Versioned Hyperdocument Object Model turns out to be a small and simple extension. Figure 17 shows the two modeling extent ions required: a new subtype of Hyperdocument and a new subtype of snapCM::Reference.

Figure 17: Versioned RTD Model
[Link to open this graphic in a separate page]

The RTD type is a specialized versioned Hyperdocument type that has additional restrictions to behave as a placeholder for other Content. It is designed to have one ValueRefHyperlink where the RefAnchor identifies self, and the ValueAnchor identifies the target document using an RTDReference with a FixedVersionResolutionPolicy. These restrictions enable other documents to link to the RTD and specify a standard Address for all effective Content in the RTD -- that is the content from the RTD's target document9 . In OCL:

context RTD
inv limit_hyperlinks :
# only one hyperlink
self.hyperlinks->size() = 1 and
# and of type ValueRefHyperlink
self.hyperlinks.oclIsKindOf(ValueRefHyperlink)

inv specify_RefAnchor :
let valref:ValueRefHyperlink = self.hyperlinks in
  # RefAnchor.location is of type ThisLocation
  valref.ref.location.oclIsKindOf(ThisLocation) and
  # and has only one Address
  valref.ref.addresses->size() = 1 and
  # and Address identifies all local content
  valref.ref.anchors.anchoredContent = self.content

inv specify_ValueAnchor :
let location:Location = self.hyperlinks.value.location in
  # location is of type DefinedLocation
  location.oclIsKindOf(DefinedLocation) and
  # location.reference is of type RTDReference
  location.reference.oclIsKindOf(RTDReference) and
  # location.reference uses a FixedVersionResolutionPolicy
  location.reference.policy.oclIsKindOf(FixedVersionResolutionPolicy)

Documents that reference RTDs can choose any appropriate resolution policy because the RTD uses a fixed resolution policy. If the RTD didn't use a fixed resolution policy then its effective content could vary from Snapshot to Snapshot. This would, for example, prevent a Document from referencing an RTD with a fixed resolution policy successfully. The implication of this is that every occurrence (that is every version) of the content representing the semantic unit needs a corresponding RTD version.

RTDs always reference other documents with the RTDReference type to enable easy filtering of the snapCM::Version.whereUsed(Snapshot) association. This is merely a convenience for writing business policy models more succinctly.

RTDs in Snapshots

This section presents instance diagrams of RTDs. First a detailed instance diagram shows a single RTD and much of it's associated content. Next, a shorthand stereotype is created to make the last diagram more succinct. The last diagram is a SnapCM instance diagram with RTDs plus sample [XML] to associate the instances in the diagram to markup languages. It is important to keep in mind that RTDs accomplish their goals only through the human implication of semantic meaning -- if a person erroneously addresses content the system can never automatically detect it.

Figure 18 shows an instance diagram for a sample RTD. We see that all of the previous constraints are reflected in this model. There is a single hyperlink named referenceLink that is of type ValueRefHyperlink. This hyperlink's ref Anchor anchors the RTD's own content and the value Anchor uses an RTDReference to identify a FixedVersionResolutionPolicy to a specific target document version.

Figure 18: Example RTD Object Model
[Link to open this graphic in a separate page]

Before continuing with a complete RTD-Snapshot example another diagramming shorthand in needed. In Figure 19 an RTD and associated instances are shown on the left, while on the right all intermediate objects are replaced with a single stereotyped "<<rtdref>>" association. This stereotype represents all objects and associations on the left side, not a direct association from an RTD instance to a Document instance.

Figure 19: RTD Reference Stereotype
[Link to open this graphic in a separate page]

The "<<rtdref>>" stereotype is used in Figure 20 to succinctly show an example RTD in the context of Snapshots. This diagram shows a single hyperdocument version using RTDs to track a unit of semantic content across three Versions and two Resources.

Figure 20: RTD Snapshot Example
[Link to open this graphic in a separate page]

To connect this SnapCM model with markup, the Documents in this diagram can, for example, be represented by [XML] documents with [XInclude] and [XLink] directives. The hdoc1:Hypderdocument could contain the following [XML]:

Figure 21: hdoc1.xml
<?xml version='1.0'?>
<document xmlns:xlink="http://www.w3.org/1999/xlink">
  <xlink:simple xlink:href="uri::repository:r2:OnSnapshot"/>
</document>

This [XML] contains a document element and a simple [XLink] addressing content found by resolving a repository URI. This URI specifies the name of an RTD Resource and the resolution policy OnSnapshot for automatic resolution on each Snapshot.

The first RTD version, rtd1:RTD, can be:

Figure 22: rtd1.xml
<?xml version='1.0'?>
<include xmlns="http://www.w3.org/2001/XInclude"
            href="uri::repository:r3:Fixed:doc1#xpointer(/*/para)"/>

The [XML] for the RTD is only an [XInclude] element with a repository URI to specify the name of a Resource, the Fixed resolution policy, the Version that is "fixed" for resolution, and an [XPointer] to identify the concrete address in the target document.

The initial version of the target document, doc1:Document, can contain the following [XML]:

Figure 23: doc1.xml
<?xml version='1.0'?>
<document>
  <para>Alpha</para>
</document>

On snap1:Snapshot this is the content that is indirectly resolved and included into hdoc1:Hyperdocument. On snap2:Snapshot a new version, doc2:Document, is created with the following [XML]:

Figure 24: doc2.xml
<?xml version='1.0'?>
<document>
  <section>
    <para>Alpha</para>
  </section>
</document>

The corresponding new rtd2:RTD contains:

Figure 25: rtd2.xml
<?xml version='1.0'?>
<include xmlns="http://www.w3.org/2001/XInclude"
            href="uri::repository:r3:Fixed:doc2#xpointer(/*/section/para)"/>

The two changes from the previous RTD are 1)"doc1" -> "doc2" to account for the new fixed target version, and 2) the xpointer now includes "section" in the address.

Finally, on snap3:Snapshot doc2:Document is deleted from the system and another1:Document is created in it's place with [XML] content:

Figure 26: other1.xml
<?xml version='1.0'?>
<document>
  <section>
    <para>Alpha</para>
  </section>
  <section>
    <para>Beta</para>
  </section>
  <section>
    <para>Gamma</para>
  </section>
</document>

The corresponding new rtd3:RTD contains:

Figure 27: rtd3.xml
<?xml version='1.0'?>
<include xmlns="http://www.w3.org/2001/XInclude"
            href="uri::repository:r4:Fixed:other1#xpointer(/*/section[1]/para)"/>

The three changes from the previous RTD are 1)"r2" -> "r3" to account for the change of Resource, 2)"doc2" -> "other1" to account for the new fixed target version, and 2) the xpointer now includes a qualifier on "section" in the address to locate only the first section.

This example has shown the benefit of RTDs: both intra and inter-document modification were occurring in this example without requiring the inspection or modification of hdoc1:Hyperdocument.

Summary

RTDs enable the creation of proxy Resources for semantically addressed Content. This provides a flexible buffer between "used" and "using" Documents. This buffer reduces maintenance overhead by only requiring authors to ensure that RTDs are consistent without having to inspect and modify all "using" Documents. RTDs are subtypes of Hyperdocuments that themselves contain the concrete address on semantically identified Content.

Conclusion

This paper provides abstract solutions to linked information lifecycle problems. These solutions are based on the four contexts of link resolution (hyperlink, time, branch, and business domain) that can be analyzed to understand the lifecycle needs of a system. This paper defines a versioned hyperdocument model that builds directly on the [SnapCM] model. This versioned hyperdocument model can be specialized to any business domain given the results of a lifecycle needs analysis for a system. An extension of this model using [RTD]s assists in the task of keeping linked Content semantically intact as it evolves.

The ideas and capabilities of the models in this paper are applicable to a wide range of applications and systems. Markup management systems can clearly and directly refine the lifecycle support in these models. Any system that manages linked and changing information, however, shares the same problems and can benefit from the solutions proposed in this paper.

The three modes of link resolution failure (no version, incorrect version, and correct version but wrong address) can be demonstrated in the most common and simple of systems as shown in the very first example. Let this common and simple example be a clear warning: information lifecycle issues are everywhere. Very few systems today offer useful support for any but the first mode of link resolution failure. While the current application support for lifecycle issues is sufficient for some applications it is not enough for many domains (and likely that many existing systems have unsupportable lifecycle requirements right now).

Related Work

The first system to support hyperdocument versioning was Xanadu[XAN] by Nelson. The structural issues discussed in [KO] are very similiar to those described here. Nested Composite Nodes[Soares] address many of the same issues for hyperdocument version sets that Snapshots and Dependencies do. [VIT] contains a very useful summation and reference list to versioned hypermedia research. Finally, today [WebDAV] is dealing with these issues again for the HTTP [Hypertext Transfer Protocol].

Notes

1.

Document as container of content implies that documents are the unit of authoring and storage. This, however, does not imply that a document is necessarily the unit of use and re-use for processing.

2.

Business policy rules that govern the creation and modification of information but only maintain the latest modified content with no history obviously don't depend on versioning capabilities.

3.

A more detailed specification of Document and Content is beyond the scope of this paper. However, those familiar with the DOM [Document Object Model][DOM] or [HyTime] standards can readily refine the Content type to either DOM nodes or Grove nodes.

4.

Choosing the Content Versioning approach would have led to a model that made at least Document and Content subtypes of snapCM::Version, and Document Versioning would have this same Document subtyping snapCM::Version relationship.

5.

The Document Versioning approach would not have this important relation.

6.

Because the Reference uses the OnSnapshotResolutionPolicy the resolution on snap1:Snapshot is unaffected by changes on this Snapshot.

7.

The previous example demonstrated content manipulations within a single document, but the larger issue is that the content representing some semantic unit might be moved to another document entirely. Actually it's entirely possible that no documents content is modified at all. This would be the case when an author intends to use an alternative definition of "Part 46" for example. Indeed being able to track the migration of a semantic unit from document to document is just as important as tracking it within a single documents physical structure.

8.

It is actually a a matter of business policy who creates new RTDs. For example, an author that needs a new RTD for some semantic target in a document could instead of creating the RTD themselves trigger a workflow process to the appropriate groups. Additionally business process could specify periodic maintenance of the RTDs themselves to ensure that duplicates are removed.

9.

This paper has used transclusion to model the behavior of RTDs. Another mechanism could have been indirect addressing where Addresses are chained together to achieve the required indirection. [HyTime] includes indirect addressing, but [XML] has no associated standards that specify this same functionality.


Bibliography

[DOM] Document Object Model, Online from http://www.w3.org/DOM

[HYD] Nelson, Theodor H. 1987. Literary Machines. South Bend, Indiana: The Distributors.

[HyTime] HyTime, Online from http://www.hytime.org

[KO] Kasper Osterbye. Structural and Cognitive Problems in Providing Version Control for Hypertest. Department of mathematics and Computer Science, Aalborg University. Online from http://www.lirmm.fr/ftp/LIRMM/comp/echt92papers/ECHT92acceptedPapers/KasperOEsterby.ps

[RTD] W. Eliot Kimber, Peter Newcomb, Steve Newcomb. Version Management as Hypertext Application: Referent Tracking Documents. Online from http://www.isogen.com/papers/ref-track-docs-paper.pdf

[SnapCM] John D. Heintz, Joshua Reynolds. SnapCM: Abstract Model. Online from http://www.isogen.com/papers/snapCM.pdf

[Soares] Luiz Fernando G. Soares. Nested Composite Nodes and Version Control in Hypermedia Systems. Online from http://cs-people.bu.edu/dgd/workshop/soares.html

[TM] Topic Maps, Online from http://www.topicmaps.org

[UML] UML, Online from http://www.uml.org

[VIT] Fabio Vitali. Versioning Hypermedia. ACM Computing Surveys 31(4), December 1999. Online from http://www.cs.brown.edu/memex/ACM_HypertextTestbed/papers/50.html

[WebDAV] WebDAV, Online from http://www.webdav.org

[XAN] Theodor H. Nelson. Literary Machines, Edition 87.1, Sausalito Press, 1987. Online from http://www.sfc.keio.ac.jp/~ted/TN/PUBS/LM/LMpage.html

[XInclude] XInclude, Online from http://www.w3.org/XInclude

[XLink] XLink, Online from http://www.w3.org/XML/Linking

[XML] XML, Online from http://www.w3.org/XML

[XPointer] XPointer, Online from http://www.w3.org/XML/Linking



Versioned Hyperdocuments

John D. Heintz [ISOGEN International, LLC]