Structured Software Assurance

John L. Clark
jclark@nps.edu

Abstract

This document introduces a loose architecture called Legere that builds on Web Architecture and structured authoring disciplines for developing software. This approach allows developers and evaluators to transparently inspect the internal structure of a software project like browsing the Web. The goal of this approach is to augment a high assurance development process, enabling mapping between design layers and facilitating a comprehensive understanding of the project.

Keywords: Editing/Authoring; Content Management

John L. Clark

John L. Clark is an information assurance researcher at the Naval Postgraduate School. He is currently focused on ways to present a cohesive view of high assurance software projects to system evaluators through the use of XML and related semantic web technologies.

Structured Software Assurance

John L. Clark

Extreme Markup Languages 2006® (Montréal, Québec)

Copyright © 2006 John L. Clark. Reproduced with permission.

Introduction

How can software users—either individuals or organizations—better determine the correctness of the software they are using? Similarly, how can software developers better determine the correctness of the software they are creating? These questions take on additional weight when considered with respect to the security or the safety of the product in question. What software is secure? What software is safe? How can users determine these qualities?

The pedestrian approach to selecting software involves taking recommendations. The software seeker will ask friends and colleagues about the qualities of various software. She will seek out testimonials and product reviews, and try to determine how much trust she has in the sources of these recommendations. As a user, often she will then apply a "wait and see" strategy, where she takes the software's claims at face value and fixes problems as they arise. This strategy has obvious problems. The computer security field calls this the "penetrate and patch" method of security, and it is telling that "penetrate" comes before "patch" ([Sch79], [Sch01]).

Software evaluation is a more proactive approach. To actually evaluate the correctness of software, some process must be in place that can provide evidence of this correctness. The Common Criteria for Software Evaluation (CC) [CC3] offers one such process. A process such as the CC tends to generate a great amount of documentation describing the security design and characteristics of the software in question. Accepted practices such as the CC grade this documentation into layers that cover different levels of abstraction in the software development process.

Currently, software development processes combine the documentation artifacts that they produce in an ad-hoc, manual way to be presented as evidence to evaluators, and eventually to users. A software project needs to be able to present a flexible and manageable view of its evidence to its evaluators. The goal with this work is to introduce an idea that encompasses a new way to manage the artifacts of the software life-cycle that treats those artifacts and their relationships in a structured manner. In this idea, a software development project will use Web Architecture [WebArch] and structured authoring disciplines to be able to transparently present and inspect its internal structure like browsing the Web. This paper describes technologies that a project can use to manage and present its information.

Web Architecture provides a model for thinking about Web content in a way that is consistent with common practice in utilizing the Web. Similarly, this paper's approach to reading a software project in order to gain assurance about its correctness builds on infrastructure and ideas that are already in place. The discussion begins in Section “Building on established technologies” by describing how the approach fits in with this infrastructure. In the following sections, the paper explores the new components that this project has added to support this approach.

Introducing an example

Before continuing, the introduction of a simple example will help make some of the following discussion more concrete.

User interfaces are important for helping authors create and annotate structured content. One advantage of XML is that it can be meaningfully utilized in a variety of interfaces. Many authors have environments set up to write documents in a particular XML dialect of choice. Authors grow familiar not only with the interface, but the dialect itself. One XML application, DocBook (http://docbook.sourceforge.net/), is a commonly used colloquial XML dialect (http://www.mulberrytech.com/Extreme/Proceedings/html/2004/Sperberg-McQueen01/EML2004Sperberg-McQueen01-toc.html). Authors might want to use their chosen interface and DocBook for writing Extreme Markup Language conference papers. Another strength of XML is the existence of powerful concepts and tools for mapping from one XML namespace to another. To that end, this example develops and presents an XSLT script for converting DocBook to the Extreme Markup Language conference dialect.

Conveniently, this development effort provides a brief illustration of this research into structured software assurance. This paper describes how its concepts can be used on this example as it moves through them. This is a droll example, but it is small enough to be discussed here and the result of this example may also be of a benefit to the readers of this paper.

Building on established technologies

The metaphor here for reading software is that of browsing the Web. Given that, tools such as LXR [LXR] and ViewVC (formerly ViewCVS) [ViewVC] are obvious examples of tools where users literally do browse software source code in a Web context. These tools are very helpful to developers, but they are inadequate for complete evaluation of software. First, source code is only part of the picture. In order to be able to characterize the correctness of software, evaluators need to know what behavior is expected, and what is an error. This leads them to documents stating requirements, documents describing models and designs, and other "non-executable" artifacts that provide the context for measuring the software. Second, source code indexing tools are primarily designed for navigation, and not for query and other kinds of reasoning. The goal is to be able to easily navigate the project as a result of having the underlying relationships exposed, not to have a navigation UI as the complete solution.

A second class of tools provide powerful features for managing a software development project on the Web. Examples of these collaboration tools include SourceForge (http://sourceforge.net/) and CollabNet (http://www.collab.net/); examples of projects using these tools include 4Suite (http://sourceforge.net/projects/foursuite) and Subversion (http://subversion.tigris.org/), respectively. These tools tend to focus on enabling direct communication both among developers and between developers and users. They expose various conduits for information such as mailing lists, bug trackers, and hypertext views of the source code like those mentioned above. Like these source code views, they essentially provide a broad array of entry points for navigating the artifacts of the project. They do not attempt to allow developers to model and then browse the architectural structure and design of their projects. Because of this, these tools are primarily useful for coordinating the day-to-day (or event-to-event) activity on a project, and so are mainly used by developers on a project. Further, the contents of these tools is either not designed for repurposing and query (for example, the mailing list archives) or is not available at all (for example, the databases of project metadata).

To better expose the information contained within software project artifacts, to make this exposed information more easily queryable, and to naturally extend this information, a software project can equip evaluators and anyone interested in investigating the quality of software products with two main capabilities. First, evaluators need to be able to browse the project and the project's internal structure like they would browse the Web in order to be able to manually learn about the project. Second, they need to be able to query the project as a transparent data store in order to be able to automate some results. To do this, one goal of this research is to express project information in a Semantic Web context, using structured authoring to actually encode content and the Resource Description Framework (RDF) [RDFSyntax] to model the relationships within that content.

The name of this approach to developing and publishing project information is the Legere Architecture, or simply Legere. Legere integrates a number of technologies, including XML and RDF, in order to express project information.

Project relationships with RDF

One of the primary requirements of Legere is to enable mapping from one level of description in a project to a more detailed one. For example, at the top level a project will have a set of requirements and it may then have several levels of design that indicate how the project will meet these requirements. At the low level a project will take its design and then develop source code documents that should implement the project's design (and hence its requirements). A project need a way to indicate that these refinement relationships exist. If it considers these different levels of description to be different objects, or resources, on the Web, then the mapping might look like a hyperlink. A hyperlink is insufficient, however, because it does not indicate the nature of the relationship between the linked and linking resource; instead it indicates only that some relationship exists. RDF is a technology that allows us to know the nature, or type, of these relationships, and therefore a project can use it to indicate the nature of these mappings as well as other relationships.

The Resource Description Framework (RDF) is "a language for representing information about resources in the World Wide Web" [RDFPrimer]. A project can always store information about such resources using natural language statements, but if it needs to allow this information to be processed using a computer then the project needs to use a well-defined syntax and model; RDF offers one such model and a set of syntaxes for this purpose. RDF has several "nice" properties that make it a natural choice for Legere. When representing information about resources, the resources themselves are referenced using the same identification technology as the Web: Uniform Resource Identifiers (URIs). This allows users to easily match and integrate the statements with the resources being described. RDF statements are also external to any of the resources being described, and so they can outline the structure of a project independent of the content of that project. The use of RDF (or other, similar technologies) in conjunction with the Web resources that it describes is known as the Semantic Web, and it is this conjunction that Legere leverages to build assurance in a software project.

The Semantic Web vision holds a good deal of promise for being able to perform automated inference to learn more about a body of information than what is immediately available. Legere uses this inference to help developers determine how their current work fits into the larger scheme of the project as well as to help evaluators make conclusions about the quality of the project. Furthermore, when combined with information about which resources have representations, a project can automatically produce a hypertext crossreferenced set of documents that reflect the state of the documents themselves as well as the relationships between those documents. Combining the structured documents and the RDF that binds them, the project can produce any number of different views on its data that can be used to inspect the its status at many different levels of formality.

There are a number of challenges that a project faces when its authors (that is, its developers) try to treat individual artifacts as part of a larger web of data, and in particular when they must make RDF assertions about working data with respect to existing project information. First of all, they must be able to quickly identify the resource that they want to relate to their current work. For this reason, one part of this research involves exploring user interfaces for editing structured content, easily naming new content, and locating names for previously existing content. Using Legere, authors of software (that is, developers) will have a document-centric view of the project and the relationships between different project artifacts. That is, they should be able to easily filter the parts of the project that they see based upon the artifacts on which they are currently working. For example, if a developer is modifying a system specification at a certain level of abstraction, then she should be able to quickly determine what other parts of the project are affected by her change.

Another challenge is to provide developers with a useful set of verbs and noun types, an RDF Schema, that they can use to make assertions about elements (literally and figuratively) of the work that they are doing. Two important types in the Legere project schema are that of a Requirement, which indicates that the target resource expresses some characteristics the project must fulfill in order to be considered complete, and that of an Implementation, which indicates that the target resource corresponds to an machine-executable component used to achieve some purpose. One important verb, or property, in this schema is the implements action, which notes that some resource, which itself might be a Requirement or an Implementation, fulfills (i.e. refines) a Requirement, as described above. “Appendix: A simple RDF Schema for classifying software project relationships” provides an overview of the experimental RDF Schema for Legere. Developers can also use other RDF dialects to augment their descriptions of project resources.

RDF in the example

The DocBook2ExtremeML example project needs to be able to map between its requirements and the implementation of those requirements. In this way, users can perform a very simple evaluation by checking to see if all the requirements have an implementation. They can then further do a source code audit on these implementation components, using the corresponding requirement or requirements as a point of reference.

This project has a single RDF/XML file (DocBook2ExtremeML.xsl.xweb.rdf; see the link to the author package) that expresses a small set of relationships in the project. It identifies one requirement with the URI ahttp://infinitesque.net/projects/DocBook2ExtremeML/trunk/source/DocBook2ExtremeML.xsl.xweb#requirement.mapping, and its working implementation with the URI ahttp://infinitesque.net/projects/DocBook2ExtremeML/trunk/source/DocBook2ExtremeML.xsl.xweb#top. The following excerpt includes statements relating these two in this way:

<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:dcterms="http://purl.org/dc/terms/"
         xmlns:project="ahttp://infinitesque.net/2006/rdfs/project/"
         xmlns:foaf="http://xmlns.com/foaf/0.1/"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xml:base="ahttp://infinitesque.net/projects/DocBook2ExtremeML/trunk/source/
                   DocBook2ExtremeML.xsl.xweb">
  <project:Requirement rdf:ID="requirement.mapping">
    <dc:title>language mapping</dc:title>
    <dcterms:requires rdf:resource="ahttp://docbook.org/xml/4.4/"/>
  </project:Requirement>
  <project:Requirement rdf:ID="requirement.skippage"/>
  <project:Requirement rdf:ID="requirement.secnumbers"/>

  <project:Implementation rdf:ID="top">
    <project:partlyImplements rdf:resource="#requirement.mapping"/>
  </project:Implementation>

  <project:Implementation rdf:ID="templates.process-remaining">
    <project:implements rdf:resource="#requirement.skippage"/>
  </project:Implementation>

  <project:Implementation rdf:ID="template.root">
    <project:implements rdf:resource="#requirement.secnumbers"/>
  </project:Implementation>

  <rdf:Description rdf:about="ahttp://docbook.org/xml/4.4/">
    <dc:title>DocBook XML 4.4</dc:title>
    <foaf:page rdf:resource="http://docbook.org/xml/4.4/"/>
  </rdf:Description>
</rdf:RDF>

The full RDF file includes a few more statements that illustrate using other RDF dialects to flesh out the description of the project.

An interested user can then process this RDF in a variety of ways. One way, using IsaViz (http://www.w3.org/2001/11/IsaViz/)—an RDF graph visualization tool, produces Figure 1, which a developer or evaluator can use to gain a visual perspective on the layout of a part of the project. The user can also query the RDF to learn about the status of the project. For example, the diagram shows that according to developer annotations, some requirements are not fully met.

Figure 1: Visualization of the RDF Statements about the DocBook2ExtremeML XSLT Script
[Link to open this graphic in a separate page]

Topic Maps instead of RDF

In Topic Maps For Open (Source) Developers [Ahm05], Kal Ahmed describes several ways that Topic Maps can be used to integrate documentation in a software project. This approach is very similar to the one this section describes. There are several reasons why Legere uses RDF instead of Topic Maps for cross-referencing a project's data. RDF seems to fit better into a "direct mapping" model, instead of the indexing model that Topic Maps uses. RDF is very closely tied to resource identification using URIs, which fits better into the web perspective of evidence presentation that Legere is trying to achieve. RDF has good tool and specification support (for example, a query language that supports XML results) that lend themselves to integration with other Web technologies.

Still, Topic Maps would likely work for a software project's mapping purposes. They support direct mapping, even if it is not their most obvious model. They can use URIs for resource identification—particularly in their XML interchange syntax, and the tools may fit nicely into this architecture. RDF simply seemed to be the clearer choice, and even that choice is continually being reevaluated. One of the goals with Legere has been to have loosely coupled components in order to give it flexibility and adaptability. The real take-away from the discussion in this section should be that Semantic Web technologies can bring documentation together in a formal way that can support coherently "reading" a more complete presentation of a project. RDF provides a potential path to this goal.

Structured authoring with XML and Literate Programming

RDF enables developers to make individual statements about components of the project. How do they identify and access components when there are multiple components within a single document? For example, many requirements may be individually described within a larger requirements document. Also, the ultimate product of a project is its implementation. What happens if they want to make statements about the relationships between a statement in a piece of nonexecutable documentation and an implementation fragment, a piece of source code? If a project uses XML for its documentation then developers can give identifiers to these individual elements and then reference individual elements (literally) using URIs with fragment identifiers. Source code fragments can also have URI-compatible identifiers if a project stores them in XML elements. Once these document components can be identified using URIs, developers can make statements directly about the contents of these elements using RDF.

With respect to source code, structured authoring can take the form of Literate Programming [Knu92]. Legere takes advantage of Literate Programming as a tool for explicitly encapsulating targeted sections of code—or fragments—in order to make assertions about those fragments in the context of other project artifacts. Legere currently uses XWEB [Wal02], an implementation of Literate Programming in XML, as the basis for "Literate" source code artifacts. Figure 2 illustrates the concept of using Literate Programming fragments and other structured content as potential resources to be used in assertions about the project. XWEB conveniently assigns each source fragment an identifier, as illustrated by the bottom box. Developers can assign identifiers to elements in nonexecutable documentation as well, as illustrated by the top box. The RDF/XML that relates these two articles of the project is listed in the center. This RDF annotates each article individually and describes their relationship.

Figure 2: Literate Programming Fragments as Endpoints for RDF Assertions
[Link to open this graphic in a separate page]

While the prime motivation for using XML for structured authoring is the ability to reference content to an arbitrary level of granularity, the other benefits of XML also contributed to the decision to use it in Legere. XML is a reasonably transparent data format. First, this allows users to reuse the data from a software project with other tools. Second, this gives the data a long lifetime (that is, a high degree of integrity). Finally, this allows users to easily restructure this data for different purposes or views. These advantages present in XML flow from another of its core properties: it is a technology around which there is open and vibrant discussion, and it is free (as in speech) and not coupled to a particular vendor.

Literate Programming in XML provides the benefits of both Literate Programming and XML. Literate Programming was originally intended to provide a flexible way for authors to create full and rich documentation for their code, and this aspect is still very important for evaluation purposes. The very concept of slicing source code up into meaningful, readable fragments, though, encapsulates these fragments so that developers can identify them and then link them into the broader context of the project.

Structured authoring in the example

In a larger project, the project team would likely separate out statements of requirements from the implementation of those requirements. The team might also provide other layers of documentation to support the design and model of the implementation. For this simple example, however, the descriptions and the implementation of this "project" are contained in a single DocBook+XWEB file (DocBook2ExtremeML.xsl.xweb in the author package)). The example project already has some statements about its components as presented in Section “RDF in the example”. That section highlights two URIs: one for a requirement, and the other for the implementation of that requirement. Here is the requirement itself:

<listitem id="requirement.mapping">
  <para>This stylesheet shall approximate the DocBook 4.4
  <sgmltag>article</sgmltag> content from the source document as
  closely as possible in the content of the output document.</para>
</listitem>

The following listing contains the implementation of that requirement. Note that the implementation itself refers to other fragments that would need to be expanded in order to produce the complete implementation. Authors could also make statements about these "descendant" fragments in order to describe their relationship to the design of the project.

<src:fragment id="top">
<xsl:stylesheet version="1.0" exclude-result-prefixes="src">
  <xsl:output doctype-system="extremepaperxml.dtd"/>

  <xsl:param name="secnumbers" select="'0'"/>

  <xsl:key name="idkey" match="*" use="@id"/>

  <src:fragref linkend="template.root"/>

  <src:fragref linkend="templates.process-remaining"/>

  <src:fragref linkend="default_templates"/>

  <src:fragref linkend="template.article"/>

  <src:fragref linkend="templates.front"/>

  <src:fragref linkend="templates.body"/>

  <src:fragref linkend="templates.rear"/>

  <src:fragref linkend="templates.cross-references"/>

  <src:fragref linkend="templates.para"/>

  <src:fragref linkend="identity"/>

  <src:fragref linkend="inlines"/>
</xsl:stylesheet>
</src:fragment>

XML enables the descriptions of the sample project (in RDF) to be tied directly to the content of the project itself.

Documentation mapping and reintegration

RDF provides the means to make specific, machine-processable statements about arbitrary resources. Evaluators will need to be able to visualize and navigate the set of resources and statements in a unified fashion. In a website, individual resources, or web pages, provide navigation to other resources through hyperlinks (links). A view of the project can consider the components of an RDF statement to be potential endpoints for links, and so it can create links to incorporate information from the RDF statements into views of the resources they describe. A set of RDF statements (an RDF graph) can create any number of different structures that could contribute to the description of a given resource. In order to meaningfully incorporate this description into a view of the resource, a project needs to be able to organize these statements into a regular form, and identify where it wants this description to be placed.

One natural tool for producing website content (for example, XHTML) from XML sources is XSLT. If this XHTML is going to contain additional links based upon a pool of RDF statements, then information from these RDF statements needs to be available to the XSLT scripts via some mechanism. RDF does not always come in an XML form, and even when it does, XSLT is not well-suited to searching through sets of RDF statements for patterns and meaning. SPARQL [SPARQL], on the other hand, provides an interface that was designed for querying RDF statements for specific patterns. It even provides a convenient XML format for its query results [SPARQLResults]. In this way, SPARQL offers a channel for constructing ordered XML reports about complex RDF data. A project then needs to be able to insert specific SPARQL query results into specific places in its document views.

In order to annotate resources with SPARQL query results pertaining to those resources, Legere has developed a very simple container syntax for SPARQL statements called reintegration (r11n). Reintegration provides a simple XML encapsulation for a single SPARQL statement. A set of reintegration containers can be placed in arbitrary locations within any XML document (e.g. using an XSLT script). Next, a reintegration processor scans a document, and replaces each reintegration container with the results of executing the query, contained in the container, against some set of RDF statements. Finally, an XSLT script can process the newly SPARQLing document, converting it into the desired output format.

One advantage of reintegration container elements is that they allow the formatting pipeline to carefully determine how the RDF will be presented in a document. First, reintegration containers can be placed anywhere in a given document, which allows a later formatting stage to display the results in the correct structural context within the document. Second, reintegration containers can use the full power of SPARQL to retrieve the desired data from the external data set.

Reintegration in the example

In the ongoing example, the requirement for proper mapping from DocBook to ExtremeML is expressed in a listitem. In a web view of the requirement, it will likely be displayed as a bullet point. Previous discussion of this example have introduced RDF annotation for this resource, and the web view of the requirement should integrate some of this annotation. An XSLT script (inject-queries.xsl (inject-queries.xsl)) adds a reintegration template to any element which also has an identifier. The following listing shows the reintegration template for the element of interest after it has been placed in context:

<listitem id="requirement.mapping">
  <para>This stylesheet shall approximate the DocBook 4.4
  <sgmltag>article</sgmltag> content from the source document as
  closely as possible in the content of the output document.</para>
  
  <r11n:sparql base="ahttp://infinitesque.net/projects/DocBook2ExtremeML/trunk/source/
                     DocBook2ExtremeML.xsl.xweb"
               context="#requirement.mapping">                 
BASE <r11n:base/>
PREFIX rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt;
PREFIX dc: &lt;http://purl.org/dc/elements/1.1/&gt;
PREFIX project: &lt;ahttp://infinitesque.net/2006/rdfs/project/&gt;
PREFIX foaf: &lt;http://xmlns.com/foaf/0.1/&gt;

SELECT ?propLabel ?targetLink ?targetText
WHERE
{ 
  <r11n:context/> a project:Requirement ;
                  ?prop ?forward .
  ?forward foaf:page ?targetLink .
  ?forward dc:title ?targetText .
  { { ?prop rdfs:label ?propLabel }
    UNION
    { ?prop dc:title ?propLabel } }
}
  </r11n:sparql>
</listitem>

In this reintegration container, the SPARQL statement has two elements in its text: r11n:base and r11n:context. A reintegration processor will replace these with either the values provided in the attributes of the reintegration container or the base URI in scope and the "closest" fragment reference, respectively. This SPARQL query asks for the page, the title, and the label of any properties that annotate the current context (which, in this case, is the mapping requirement). This SPARQL could have asked an arbitrarily complex pattern of data surrounding the current context. Another interesting query in this situation might be to locate any implementations of the current requirement, based upon developer annotations; this is left as an exercise for the reader.

The following code listing shows the new results after a reintegration processor has inserted the query results in place of the reintegration container.

<listitem id="requirement.mapping">
  <para>This stylesheet shall approximate the DocBook 4.4
  <sgmltag>article</sgmltag> content from the source document as
  closely as possible in the content of the output document.</para>

  <sparql xmlns="http://www.w3.org/2005/sparql-results#"
          xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
          xmlns:xs="http://www.w3.org/2001/XMLSchema#">
    <head>
      <variable name="propLabel"/>
      <variable name="targetLink"/>
      <variable name="targetText"/>
    </head>
    <results distinct="false" ordered="false">
      <result>
        <binding name="propLabel">
          <literal xml:lang="en-US">Requires</literal>
        </binding>
        <binding name="targetLink">
          <uri>http://docbook.org/xml/4.4/</uri>
        </binding>
        <binding name="targetText">
          <literal>DocBook XML 4.4</literal>
        </binding>
      </result>
    </results>
  </sparql>
</listitem>

It would then be a very easy in XSLT to convert this to XHTML. Since the "host" language is DocBook, for this example the view could use a simple customization layer on top of the DocBook XSL distribution. The following listing shows one possible result. Clearly, the stylesheet could format this data in a number of different ways, and it could format one results element differently from another.

<li>
  <p>This stylesheet shall approximate the DocBook 4.4
  <code>article</code> content from the source document as
  closely as possible in the content of the output document.</p>

  <p class="meta">This item requires
  <a href="http://docbook.org/xml/4.4/">DocBook XML 4.4</a>.</p>
</li>

Identifying resource contexts

By looking to RDF for a basic relationship model, Web Architecture concepts are important when it comes to naming and managing resources. Part of this research, therefore, investigates how to easily create and manage URIs as well as how to expose important resources using URIs.

In the Legere model individual resources will appear in a number of different contexts. These resources also have a different identity within each different context. For example, a particular document might have a file URI on a particular developer's workstation indicating the filename of that document on her machine. The same document would also have an http URI indicating its location in the project web view, and in many situations it is useful to give the same document another URI that is independent of any location. Legere call these three contexts the local, exposed, and global contexts, respectively.

The local context of a resource is likely to change from workstation to workstation, and users need a way to easily track the relationship between the local naming layout and the other resource identities. This research is experimenting with a simple URI mapping document format called WebCap [Cla06] in order to help solve this problem. The WebCap format groups related URIs into small bundles of typed components, and explicitly allows for these bundles to be managed in such a way as to take advantage of the hierarchical structure of many URIs.

The basic model for WebCap is that of a capability list. The author of a particular WebCap document should use that particular document for expressing a particular purpose, or capability; any resource whose URI is marked as being exposed participates in this capability. In this way WebCap documents can be used to "color" resources in a manner which crosses URI namespaces for a variety of different applications. For example, the original goal of developing WebCap was to provide a way to locate local resources given an exposed or global name (or the other way around). Given the information about which resources should be exposed, though, the same WebCap document (or set of documents) could be used to select resources that should be built and installed into an exposed view.

WebCap in the example

URIs are used in a number of different places within the pieces of the example that previous sections discuss. One of these URIs is the URI for the DocBook2ExtremeML source file: ahttp://infinitesque.net/projects/DocBook2ExtremeML/trunk/source/DocBook2ExtremeML.xsl.xweb. Users might also want to obtain references that refer to the same resource, but in different contexts. For example, a developer might want a reference to the document that can be resolved locally (e.g. a file URI) or an evaluator might want a reference to the document that is exposed in a web environment (e.g. a http URI). A WebCap document can be used here to express the fact that this URI is available in these contexts. The following code listing shows a document containing such an entry.

<capabilities xmlns="ahttp://infinitesque.net/2006/ns/capabilities/">
  <relationship>
    <sameAs xmlns="http://www.w3.org/2002/07/owl#"/>
  </relationship>

  <map>
    <global>ahttp://infinitesque.net/projects/DocBook2ExtremeML/trunk/source/
      DocBook2ExtremeML.xsl.xweb</global>
    <local>file:///home/john/cache/2006/structured%20software%20assurance/
      DocBook2ExtremeML.xsl.xweb</local>
    <exposed>http://www.mulberrytech.com/Extreme/Proceedings/html/2006/Clark01/
      DocBook2ExtremeML.xsl.xweb</exposed>
  </map>
</capabilities>

A user can then use this WebCap document to replace URIs of one type with URIs of another type as needed. This is very similar to the functionality of XML Catalogs [XMLCatalogs], but WebCap documents provide typed classes for the different identifiers that allow a user to determine which identifier is appropriate for a given context and function.

Summary and challenges

What does the Legere approach look like at a practical level? As introduced here, it is conceptually fairly straightforward. Software project developers use the XML dialect of their choice for writing project documentation. This documentation will include such items as requirements, specifications, process information, and other descriptions of the characteristics of the project. The implementation of the project—its source code, which can itself be thought of as one type of documentation—is maintained using XWEB fragments likely mixed with the same XML dialect. As they are creating or editing project content, authors maintain a set of RDF statements in parallel about this content and its relationships with the rest of the project. Both these statements and the target content are reviewed by administrators of the project for "correctness" as the project defines it, which likely includes concepts of consistency and completeness.

Documents, source code, developer RDF statements, inferred RDF statements, and gleaned RDF statements are combined together and published as a set of hyperlinked documents (a website) that provide a view of this data. Developers can utilize this site and the underlying data to facilitate their development effort, and evaluators can utilize the site and the data to determine whether the project meets certain standards. Critical project relationships are explicitly stated and can be both browsed and queried. In this way, Legere supports the documentation and mapping needs of an assurance process framework such as the Common Criteria.

This paper peeked into a small example project surrounding the development of an XSLT script. This development did not attempt to follow an established development process, but the project does combine its documentation and data into a simple structure that speaks for itself. This is the whole point of Legere. Any DocBook 4.4 document can serve as a test or demonstration of the project, including the DocBook sources for this paper (EML2006Clar0511.dbk in the author package).

As Legere builds upon the Semantic Web, it also builds upon its challenges. Experience has shown that structured authoring in XML is a very alien mode of operation for even technically savvy developers, and that is just the first step in the content staircase. Writing source code nonlinearly, in a hypertext environment (that is, with Literate Programming) is very different from the normal serialized, flat production of source code. Perhaps the greatest challenges lie with writing meaningful RDF statements, however. Taken in context, though, it is highly challenging to show that a software project is correct, and meeting this challenge will require new approaches and additional effort.

Fundamentally, this research is motivated by the hope that the developers of software projects will move to open up their architectures and designs to more meaningful inspection. Legere, or components of Legere, may provide useful ideas on how to achieve this goal.

Appendix: A simple RDF Schema for classifying software project relationships

This section describes a simple, experimental RDF Schema for guiding the expression of software project relationships. This schema uses the XML Namespace URI ahttp://infinitesque.net/2006/rdfs/project/ for its terms, and this description uses project to represent this XML Namespace in QNames.

Classes

project:Article

The core RDF class for project resource documents is the Article. The Article class is a subclass of rdfs:Resource. This means that everything that is an Article is also an rdfs:Resource (any resource). A project Article is any item within the scope of a (software) project contributing substantively to the project. This class is subclassed in the following sections to describe specific ways in which project resources contribute to the project.

project:Component

The Component class is a subclass of Article; resources of this type have a parent Article as part of the same file.

project:Documentation

The Documentation class is a subclass of Article; resources of this type do not change the state of the environment over time when considered as elements of a product of the project.

project:Implementation

The Implementation class is a subclass of Article; resources of this type perform some function and change the state of the environment over time when considered as elements of a product of the project. This change of state is typically desirable, as it should meet one or more requirements equivalent to such a state change, although it may be unintentional.

project:Assumption

The Assumption class is a subclass of Article; resources of this type are statements that are automatically added to the knowledge about the environment into which a product of the project will exist.

project:Requirement

The Requirement class is a subclass of Documentation; resources of this type describe desired state change capabilities. The state change is taken with respect to the actors' environment. It is typically infeasible or lacks a particular quality given a particular configuration. A Requirement states the change that must take place, possibly including characteristics about how the change must take place (known as "quality-of" characteristics).

project:Input

The Input class is a subclass of Documentation; resources of this type describe information flow into a software project Article or collection of Articles.

project:Output

The Output class is a subclass of Documentation; resources of this type describe information flow out of a software project Article or collection of Articles.

project:Function

The Function class is a subclass of Requirement; resources of this type specify behavioral characteristics requested to bring the environment from one state to another.

One succinct way of describing a Function with an emphasis of its role in a software system is that it is "the mapping of inputs to outputs, and their various combinations" [LefWid99]. This definition has the further quality that it maps nicely to the standard mathematical definition of a function.

project:Constraint

The Constraint class is a subclass of Requirement; resources of this type describe the nonfunctional characteristics to which a state change must hold. Together with Function, these two classes partition the domain of Requirements.

project:Evidence

The Evidence class is a subclass of Documentation; resources of this type contribute to an argument in favor of some proposition.

project:Example

The Example class is a subclass of Evidence; resources of this type represent single points in a sample space intended to illustrate some proposition.

project:Claim

The Claim class is a subclass of rdf:Statement. This means that anything that is a Claim must be a statement. Resources of type Claim are statements about the state of the project; additional information may be needed to determine whether these statements are true. A resource of type Claim is a statement that requires reasoning to be added to a particular knowledge base.

Properties

project:hasConvention

The hasConvention relationship describes any Article and can have as its value any rdfs:Resource. The hasConvention property is a subproperty of rdfs:seeAlso. This relationship states that the subject is party to a project convention described by the object.

Many times when working on a software engineering project, conventions are used that may not be very intuitive when seen for the first time or when taken out of context. It is important to be able to refer to the descriptions of these conventions when these conventions are used. For example, software implementation often makes use of a certain structure for variable names to indicate a certain relationship to the functionality intended in these variable names, and these sections of the implementation are related by convention to a description of the intended meaning.

project: meetsRequirement

The meetsRequirement relationship describes any Article and can have as its value any Requirement.

The notion of a requirement with respect to a project is subtly different from the generic sense of the relationship embodied by a requirement. This property captures this semantic difference. This broad relationship states that the subject of the statement (in some way) meets the requirement of the object resource, which must be of type Requirement. A requirement, then, in this sense is a target or a delta between the current state of the environment and the desired future state of the environment; this property is a claim of accomplishment of such an existing delta.

A user should subclass this relationship and the Requirement class to prescribe project- and domain-specific types of requirements.

project: equivalentTo CCRequirement

The equivalentToCCRequirement relationship describes any Requirement and can have as its value any rdfs:Literal.

This relationship means that the subject is equivalent to the Common Criteria (CC) requirement. Currently, this CC requirement is intended to be encoded textually in the same manner as that expressed in the specification (for example, FAU_GEN.2.1), but it may be desirable for there to be standard URI references for the Requirement resources defined by the Common Criteria, in which case this property will become unnecessary. If this happens, the semantics of this property will enable an automated migration to the utilization of URI references.

project:dependsOn

The dependsOn relationship describes any Article and can have as its value any Article, and it is a subproperty of dcterms:requires. This relationship states that the subject does not have a consistent meaning when taken apart from the object.

project:hasComponent

The hasComponent relationship describes any Article and can have as its value any Article, and it is a subproperty of dcterms:hasPart. This relationship simply links child Articles with their parents. For example, a single project Article, such as a detailed specification, may satisfy higher-level Requirements and introduce more detailed ones; each more detailed Requirement would be linked to this overall Article (likely of type Documentation) as objects of this relationship.

project:implements

The implements relationship describes any Implementation and can have as its value any Requirement. This relationship states that the subject resource completely provides additional functional details purposefully left out of the object resource.

project:partlyImplements

The partlyImplements relationship describes any Implementation and can have as its value any Requirement. This relationship states that the subject resource contributes additional functional details purposefully left out of the object resource.

project: providesEvidenceFor

The providesEvidenceFor relationship describes any Evidence and can have as its value any Claim statement. This relationship states that the subject Evidence helps to prove the object Claim correct.

project:tests

The tests relationship describes any Evidence and can have as its value any Claim statement. This relationship states that the subject Evidence contributes an active, dynamic component to a proof of the correctness of the object Claim.

The following two tables summarize this schema.

Table 1: RDF Classes
Class Name Summary
Article A document contributing substantively to a goal-based project
Component An Article which is a physical part of another document
Documentation An Article that provides information
Implementation An Article that performs some action
Assumption An Article that describes a project's environment
Requirement Documentation that describes a needed environmental change capability
Input Documentation that describes an avenue for information flow into some aspect of the project
Output Documentation that describes an avenue for information flow out of some aspect of the project
Function A Requirement that describes behavioral characteristics of a system
Constraint A Requirement that is not also a Function
Evidence Documentation provided to support a particular argument
Example Documentation provided to support a particular argument through illustration
Claim A Statement which may or may not be true
Table 2: RDF Properties
Property Name Summary Domain Range
hasConvention The object is an ad-hoc standard of the subject Article rdfs:Resource
meetsRequirement The object meets the subject's requirement Article Requirement
equivalentToCCRequirement The subject requirement maps to the Common Criteria requirement Requirement rdfs:Literal
dependsOn The subject article depends on the object article Article Article
hasComponent The subject has the object as a component Article Article
implements The subject implements the object requirement Implementation Requirement
partlyImplements The subject contributes to implementing the object requirement Implementation Requirement
providesEvidenceFor Evidence for the object Claim Evidence Claim
tests Illustration providing Evidence for the object Claim Evidence Claim

Bibliography

[Ahm05] Ahmed, Kal. Topic Maps For Open (Source) Developers. XTech 2005 online version (http://www.idealliance.org/proceedings/xtech05/papers/04-03-04/).

[CC3] Common Criteria for Information Technology Security Evaluation. August 1999.

[Cla06] Clark, John L. WebCap: Web-based Capability List File Format and Model. 2006-03-23. Initial draft (http://infinitesque.net/projects/Legere/specifications/webcap.xhtml).

[Knu92] Knuth, Donald E. Literate Programming. CSLI. Stanford, California. 1992.

[LefWid99] Leffingwell, Dean and Don Widrig. Managing Software Requirements: A Unified Approach. Addison-Wesley Publishing Company. Reading, Massachusetts. 22 October 1999.

[LXR] Gleditsch, Arne Georg and Per Kristian Gjermshus. Linux Cross-Reference. Web documentation (http://lxr.linux.no/).

[RDFPrimer] Manola, Frank and Eric Miller. RDF Primer. 10 February 2004. W3C Recommendation 10 February 2004 (http://www.w3.org/TR/2004/REC-rdf-primer-20040210/).

[RDFSyntax] Klyne, Graham and Jeremy J. Carroll Resource Description Framework (RDF). 10 February 2004. W3C Recommendation 10 February 2004 (http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/).

[Sch01] Schell, Roger. Information security: science, pseudoscience, and flying pigs. December 2001.

[Sch79] Schell, Roger. Computer Security: the Achilles’ Heel of the Electronic Air Force. Air University Review. Jan. - Feb., 1979.

[SPARQL] Prud'hommeaux, Eric and Andy Seaborne. SPARQL Query Language for RDF. 6 April 2006. W3C Candidate Recommendation 6 April 2006 (http://www.w3.org/TR/2006/CR-rdf-sparql-query-20060406/).

[SPARQLResults] Beckett, Dave and Jeen Broekstra. SPARQL Query Results XML Format. 6 April 2006. W3C Candidate Recommendation 6 April 2006 (http://www.w3.org/TR/2006/CR-rdf-sparql-XMLres-20060406/).

[ViewVC] Stein, Greg. ViewVC: Repository Browsing. Web documentation (http://www.viewvc.org/).

[Wal02] Walsh, Norman. Literate Programming in XML. 15 October 2002. Version 1.2 (http://nwalsh.com/docs/articles/xml2002/lp/paper.html).

[WebArch] Jacobs, Ian and Norman Walsh. Architecture of the World Wide Web, Volume One. 15 December 2004. W3C Recommendation 15 December 2004 (http://www.w3.org/TR/2004/REC-webarch-20041215/).

[XMLCatalogs] Walsh, Norman. XML Catalogs. 2005-10-07. OASIS Standard V1.1, 7 October 2005 (http://www.oasis-open.org/committees/download.php/14809/xml-catalogs.html).



Structured Software Assurance

John L. Clark
jclark@nps.edu