xmLP — a Literate Programming Tool for XML & Text

Anthony B. Coates
abcoates@TheOffice.net
Zarella Rendon
zarella@xml-factor.com

Abstract

LitProg [Literate Programming] is a technique created by Donald Knuth to make computer programs readable and maintainable. This article introduces LitProg, demonstrates what a literate program looks like, and describes the LitProg tool “xmLP”, which can be used to literately develop program sources (or other control file sources) whose content is XML or text.

Keywords: Programming

Anthony B. Coates

Anthony B. Coates was most recently Leader of XML Architecture & Design in Reuters Chief Technology Office, and represented Reuters on technical working groups for FpML [Financial Products Markup Language] and MDDL [Market Data Definition Language]. Tony recommended XML technologies and design strategies to product groups within Reuters to co-ordinate the business usage of XML and leverage the best available solutions. This included both general XML technologies and specific vertical market XML languages. His background includes developing software for technical analysis & financial graphics, developing multimedia & Web applications, and theoretical & experimental physics. He has worked with XML since 1998, and Literate Programming since 1996. Tony is a past secretary of the Australian Java Users Group.

Zarella Rendon

Zarella Rendon is an XML/SGML expert with over 10 years of experience in the industry. Ms. Rendon began working with SGML transformations in 1992 as an engineer at Northern Telecom, and continued working with SGML, XML and related standards at ISOGEN International, where she was an applications engineer as well as a co-founder. Currently working as an independent consultant, Ms. Rendon offers a unique perspective and legacy of experience to the XML/SGML community at large. Ms. Rendon is a member of several XML industry groups, including OASIS and the W3C XSL Working Group, where she strives to help further the development, support, and use of public standards.

xmLP — a Literate Programming Tool for XML & Text

Anthony B. Coates [Financial XML Specialist]
Zarella Rendon [XML Factor & W3C XSL WG]

Extreme Markup Languages 2002® (Montréal, Québec)

Copyright © 2002 Anthony B. Coates and Zarella Rendon. Reproduced with permission.

Overview of Literate Programming

The legendary computer luminary Donald Knuth [Knuth WWW] once asked why nobody ever takes a computer program to bed to read. The answer was simple — most computer programs are unreadable. Comments are too few and too far between, and the simple process of reading becomes the painful process of reverse engineering. Knuth's solution was that programs should not be 90% code and 10% comments, they should be 90% descriptive text and 10% code. This was the start of Literate Programming, or “LitProg” [LP WWW].

Documentation of Programs

In the early days of computer programming, documentation within the source code of a program was an unaffordable luxury. Some systems actively stripped comments from code in order to save storage. Although storage no longer tends to be a significant limit, programmers still spend little time on documenting their code. There are several reasons for this. One is that programmers tend to be judged by whether their code works (or appears to), not by how well it is commented. Further, if project managers need to cut something from a development schedule, documentation is an item that can be removed from the schedule without forcibly affecting the delivered functionality.

On a personal level, during the fleeting moments that a programmer writes a particular piece of code, that code can appear to be so clear and obvious that documentation seems all but unnecessary. It is only when the programmer returns to the code after a month or more, all moments of clarity lost in the past, that the impact becomes obvious. The situation is even worse when a different programmer has to work on the code. With no documentation to read, the code needs to be reverse engineered in order to understand its intent and logic, but this is difficult to do with sufficient accuracy. Too often, the code is rewritten in the belief that rewriting is easier than reverse engineering. This squanders any experience gained by the original programmer, while simultaneously introducing new bugs into the code base.

No documentation system can force an undisciplined or lazy programmer to document their code, but there are things that can be done to make the task less onerous. LitProg tools, which allow code fragments to be interspersed within the documentation, put the documentation for a piece of code right beside that code, in the same file. Compared to having code and documentation in separate files, this greatly increases the chance that as code is modified, the documentation is also modified to keep it up to date.

LitProg

A “literate program” (or “literate document”) is a human readable document containing short sections of code (known variously as “macros”, “chunks”, or “fragments”), written and ordered so that it can be understood easily by people. By contrast, most computer programs are ordered purely for the benefit of program compilers. In a literate program, source code fragments (or any textual fragments) can appear in any suitable order. When the literate document is processed, the code fragments are assembled into the order required to produce the source files by “tangling” the document, to introduce Knuth's terminology. Literate documents are also “woven” to convert them into a final documentation format. Traditionally the documentation format was TeX or LaTeX, but these days it can also be (X)HTML [(Extensible) Hypertext Markup Language], XSLFO [Extensible Stylesheet Language Formatting Objects], or PDF [Portable Document Format, aka Acrobat].

LitProg Tools

What follows is a non-exhaustive list of LitProg tools. All of these tools predate XML. References to further information can be found in the bibliography.

WEB

WEB was Knuth's original LitProg system for Pascal. WEB directly marks up many of the syntactic features of Pascal, so that in creating a valid WEB document, a programmer has pre-validated much of the syntax of the code fragments. Note that WEB was written before the WWW [World Wide Web] came to prominence. Knuth's choice of name relates to his ideas of tangling and weaving. [Knuth 92]

CWEB

Also produced by Knuth's group, CWEB supported C rather than Pascal. It has now been extended to handle C++ and Java as well. [CWEB WWW]

FWEB

FWEB is a multi-language LitProg tool which is similar in spirit to WEB & CWEB. It was the first LitProg tool to support Fortran. [FWEB WWW]

noweb

noweb is the most well-known of the language-insensitive LitProg tools. These tools do not provide any syntactic support for any computer languages, and treat all code fragments as nothing more than text fragments. Language-insensitive LitProg tools can be used for any (textual) programming language or control files, so their loss of syntactic support is compensated for by a gain in flexibility. [noweb WWW]

FunnelWeb

FunnelWeb is another language-insensitive tool. Its unique feature is that its macros can have parameters, providing some of the power of a language pre-processor. xmLP, the XML LitProg tool described in this paper, takes its inspiration most strongly from FunnelWeb. [FunnelWeb WWW]

SWEB

SWEB is C. Michael Sperberg-McQueen's SGML LitProg tool. It was the first LitProg tool whose document format could feasibly be parsed by something other than the tool itself. [SWEB WWW]

Javadoc

Sun's Javadoc is a powerful tool for generating reference documentation from comments embedded in Java code, and has inspired similar tools for other programming languages. Javadoc is ideal for documenting the available methods & classes in a Java API [Application Programming Interface]. However, Javadoc is not a LitProg tool.

The documentation that Javadoc produces extends down only the the method signatures. It does not provide any support for documenting the workings of individual methods. It does not allow the order in which methods/classes are presented to be controlled to improve readability. These are not criticisms, just observations. There is no such thing as one size fits all documentation. Indeed, there are at least 3 major classes of documentation:

  1. User (functional) documentation;
  2. Detailed documentation within methods (functions) of the what, why, and how of the code;
  3. Reference documentation which lists the available methods (functions) in an API (library).

LitProg tools do a good job of generating detailed documentation. Javadoc does a good job of generating reference documentation. Neither provides sufficient support for generating good user documentation. So, not all documentation is the same, and no documentation tool is suitable for every type of documentation. This paper focuses on LitProg tools, and hence on the problem of creating detailed documentation of the workings of program code.

A LitProg Scenario

This paper was written as a literate program, using an extended version of the “Extreme Markup Languages 2002” DTD. The literate document was processed twice using an XML LitProg tool, “xmLP” [xmLP WWW], which is described in this paper. The literate document was first “tangled”, where the macros were expanded to produce the source files. It was then “woven”, where the macros were cross-referenced and this document was generated. Both processes need to resolve the macros in the document, but for different purposes.

No source code fragments were copied into the literate document, because the literate document is the original source material from which the source code files are produced.

The following scenario illustrates how literate programs can be a valuable tool for maintaining synchronized files. Note: if at any stage you want to jump ahead to read about the LitProg tool “xmLP”, you can go directly to Section 3. However, you are encouraged to read this section first to get a sense of what a LitProg tool needs to achieve.

Financial Time Series

This demonstrates the documentation created by xmLP. So this section shows the output format. The input format is discussed in Section 3.

Consider the problem of representing the way a particular stock market share price changes over time (a “time series”). Taking a simplified view, a single daily price summary, which is an “event” from the time series, can be written as

Figure 1
xmLP Macro “Time Series Event Instance” [#1] =
<event date="2002-02-20">
  <open>85.70</open>
  <high>92.10</high>
  <low>81.37</low>
  <close>86.05</close>
  <volume multiplier="1000">811786</volume>
</event>

This macro is invoked in file #2 (Figure 16)

This macro is invoked in file #4 (Figure 20)

Note: this shows how xmLP XML macro definitions are “woven” into the documentation. Note the automatically generated cross-references.

Here, “date” is the date of the event, “open” is the opening (starting) price for that day, “high” and “low” are the maximum and minimum prices for that day (respectively), and “close” is the closing (final) price for that day. The “volume” is the number of shares traded during that day, and is commonly given in terms of thousands of shares traded.

The purpose of this example is to produce both a DTD and a W3C XML Schema to describe this event structure, within the limits of what each of these schema technologies can do. A knowledge of DTD and W3C XML Schema constructs is assumed.

open, high, low, close

The “open”, “high”, “low”, and “close” elements each contain a decimal number. In the DTD, decimal numbers can only be represented as unconstrained text. However, a suitably named entity can be used to suggest to human readers that decimal values should be used.

Figure 2
xmLP Macro “DTD: decimal pseudo-definition” [#2] =
<!ENTITY % Decimal "#PCDATA">

This macro is invoked in macro #3 (Figure 3)

Note: this shows how xmLP text macro definitions are “woven” into the documentation.

From a machine perspective, this is nothing more than a syntactic nicety. However, it makes maintenance easier for humans (by making the intent clear), and that makes it worth doing.

Figure 3
xmLP Macro “DTD: financial elements” [#3] =
{DTD: decimal pseudo-definition[2], Figure 2}
<!ELEMENT open  (%Decimal;)>
<!ELEMENT high  (%Decimal;)>
<!ELEMENT low   (%Decimal;)>
<!ELEMENT close (%Decimal;)>

This macro is also defined in macro #6 (Figure 6)

This macro is invoked in file #1 (Figure 14)

Note: this shows how invocations (expansions) of one macro inside another are indicated and cross-referenced in the documentation.

Note: this macro is defined in multiple sections that are concatenated in document order to produce the complete content of the macro.

The W3C XML Schema datatypes contain a suitable decimal type, “xsd:decimal”, so the Schema equivalent is straightforward.

Figure 4
xmLP Macro “W3C XML Schema: financial elements” [#4] =
<xsd:element name="open" type="xsd:decimal"/>
<xsd:element name="high" type="xsd:decimal"/>
<xsd:element name="low" type="xsd:decimal"/>
<xsd:element name="close" type="xsd:decimal"/>

This macro is also defined in macro #7 (Figure 7)

This macro is invoked in file #3 (Figure 18)

volume

The “volume” element contains a non-negative integer value (number of shares traded). It also has a positive integer “multiplier” attribute, since the volume is typically given in units of thousands of shares. As before, in the DTD the values are simply unconstrained text.

Figure 5
xmLP Macro “DTD: integer pseudo-definitions” [#5] =
<!ENTITY % NonNegativeInteger "#PCDATA">
<!ENTITY % PositiveInteger "CDATA">

This macro is invoked in macro #6 (Figure 6)

The DTD nonetheless allows a default value of ‘1’ to be defined for the “multiplier” attribute, so that its use with the “volume” element is optional.

Figure 6
xmLP Macro “DTD: financial elements” [#6] =
{DTD: integer pseudo-definitions[5], Figure 5}
<!ELEMENT volume (%NonNegativeInteger;)>
<!ATTLIST volume
  multiplier %PositiveInteger; "1">

This macro is also defined in macro #3 (Figure 3)

This macro is invoked in file #1 (Figure 14)

The W3C XML Schema data types contain the necessary integer data types. Although the Schema version is longer, it defines the same structure for the “volume” element.

Figure 7
xmLP Macro “W3C XML Schema: financial elements” [#7] =
<xsd:element name="volume">
  <xsd:complexType>
    <xsd:simpleContent>
      <xsd:extension base="xsd:nonNegativeInteger">
        <xsd:attribute name="multiplier" default="1" type="xsd:positiveInteger"/>
      </xsd:extension>
    </xsd:simpleContent>
  </xsd:complexType>
</xsd:element>

This macro is also defined in macro #4 (Figure 4)

This macro is invoked in file #3 (Figure 18)

event

The “event” element should contain no more than one each of the elements “open”, “high”, “low”, “close”, and “volume”. The order is not important. An “event” does not need to contain all of these elements, as any of the values could be undefined or unavailable. So each of the financial elements occurs 0 or 1 times in an “event”, in any order.

It is possible, but tedious, to create an XML DTD rule that enumerates all of the possible content options for “event”. Instead, it is simpler to make the DTD stricter than the W3C XML Schema, and have it enforce an (unnecessary) order on the financial elements.

Figure 8
xmLP Macro “DTD: event” [#8] =
<!ELEMENT event (open?, high?, low?, close?, volume?)>

This macro is also defined in macro #10 (Figure 10)

This macro is invoked in file #1 (Figure 14)

The “event” element is also required to have a “date” attribute to date the values that it contains.

Figure 9
xmLP Macro “DTD: date pseudo-definition” [#9] =
<!ENTITY % Date "CDATA">

This macro is invoked in macro #10 (Figure 10)

Figure 10
xmLP Macro “DTD: event” [#10] =
{DTD: date pseudo-definition[9], Figure 9}
<!ATTLIST event
  date %Date; #REQUIRED>

This macro is also defined in macro #8 (Figure 8)

This macro is invoked in file #1 (Figure 14)

W3C XML Schema supports the “0 or 1 times each in any order” rule using “xsd:all”.

Figure 11
xmLP Macro “W3C XML Schema: event” [#11] =
<xsd:element name="event">
  <xsd:complexType>
    <xsd:all>
      <xsd:element ref="open"/>
      <xsd:element ref="high"/>
      <xsd:element ref="low"/>
      <xsd:element ref="close"/>
      <xsd:element ref="volume"/>
    </xsd:all>
    <xsd:attribute name="date" use="required" type="xsd:date"/>
  </xsd:complexType>
</xsd:element>

This macro is invoked in file #3 (Figure 18)

timeSeries

To represent a time series, a number of events are contained within a “timeSeries” element. A time series can contain any number of events, even zero. The dates of the events within a time series must be unique, but neither of the schema technologies used here can enforce that condition.

Figure 12
xmLP Macro “DTD: timeSeries” [#12] =
<!ELEMENT timeSeries (event*)>

This macro is invoked in file #1 (Figure 14)

Figure 13
xmLP Macro “W3C XML Schema: timeSeries” [#13] =
<xsd:element name="timeSeries">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:element ref="event" minOccurs="0" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element>

This macro is invoked in file #3 (Figure 18)

DTD Source Files

With all of its required sections now explained, the DTD is assembled from the component macros as follows.

Figure 14
xmLP File [#1]: src/timeseries.dtd =
<?xml version="1.0" encoding="utf-8"?>
{DTD: financial elements[3,6], Figures 3, 6}
{DTD: event[8], Figure 8}
{DTD: timeSeries[12], Figure 12}

Note: this shows how xmLP file macro definitions are “woven” into the documentation. These define the source files that are generated during “tangling”.

This produces the following source file:

Figure 15
<?xml version="1.0" encoding="utf-8"?>


<!ENTITY % Decimal "#PCDATA">

<!ELEMENT open  (%Decimal;)>
<!ELEMENT high  (%Decimal;)>
<!ELEMENT low   (%Decimal;)>
<!ELEMENT close (%Decimal;)>


<!ENTITY % NonNegativeInteger "#PCDATA">
<!ENTITY % PositiveInteger "CDATA">

<!ELEMENT volume (%NonNegativeInteger;)>
<!ATTLIST volume
  multiplier %PositiveInteger; "1">


<!ELEMENT event (open?, high?, low?, close?, volume?)>


<!ENTITY % Date "CDATA">

<!ATTLIST event
  date %Date; #REQUIRED>


<!ELEMENT timeSeries (event*)>

The sample instance file using the DTD (and containing just a single event) requires an appropriate “DOCTYPE” declaration.

Figure 16
xmLP File [#2]: src/timeseries-dtd.xml =
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE timeSeries SYSTEM "timeseries.dtd">
<timeSeries>
  {Time Series Event Instance[1], Figure 1}
</timeSeries>

This produces the following source file:

Figure 17
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE timeSeries SYSTEM "timeseries.dtd">
<timeSeries>
  <event date="2002-02-20">
    <open>85.70</open>
    <high>92.10</high>
    <low>81.37</low>
    <close>86.05</close>
    <volume multiplier="1000">811786</volume>
  </event>
</timeSeries>

Note: the tangled source file was inserted into this document automatically using an XSLT [Extensible Stylesheet Language Transformations] script.

W3C XML Schema Source Files

The W3C XML Schema is assembled from the component macros as follows.

Figure 18
xmLP File [#3]: src/timeseries.xsd =
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  {W3C XML Schema: financial elements[4,7], Figures 4, 7}
  {W3C XML Schema: event[11], Figure 11}
  {W3C XML Schema: timeSeries[13], Figure 13}
</xsd:schema>

This produces the following source file:

Figure 19
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:element name="open" type="xsd:decimal"/>
  <xsd:element name="high" type="xsd:decimal"/>
  <xsd:element name="low" type="xsd:decimal"/>
  <xsd:element name="close" type="xsd:decimal"/>
  <xsd:element name="volume">
    <xsd:complexType>
      <xsd:simpleContent>
        <xsd:extension base="xsd:nonNegativeInteger">
          <xsd:attribute name="multiplier" default="1" type="xsd:positiveInteger"/>
        </xsd:extension>
      </xsd:simpleContent>
    </xsd:complexType>
  </xsd:element>
  <xsd:element name="event">
    <xsd:complexType>
      <xsd:all>
        <xsd:element ref="open"/>
        <xsd:element ref="high"/>
        <xsd:element ref="low"/>
        <xsd:element ref="close"/>
        <xsd:element ref="volume"/>
      </xsd:all>
      <xsd:attribute name="date" use="required" type="xsd:date"/>
    </xsd:complexType>
  </xsd:element>
  <xsd:element name="timeSeries">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element ref="event" minOccurs="0" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

The sample instance file using the W3C XML Schema (and containing just a single event) requires an appropriate “schemaLocation” declaration (in this case a “noNamespaceSchemaLocation” declaration). The “xmlns:xsi” declaration is suppressed for brevity, but generated in the actual source file.

Figure 20
xmLP File [#4]: src/timeseries-schema.xml =
<?xml version="1.0" encoding="utf-8"?>
<timeSeries xsi:noNamespaceSchemaLocation="timeseries.xsd">
  {Time Series Event Instance[1], Figure 1}
</timeSeries>

This produces the following source file:

Figure 21
<?xml version="1.0" encoding="utf-8"?>
<timeSeries xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="timeseries.xsd">
  <event date="2002-02-20">
    <open>85.70</open>
    <high>92.10</high>
    <low>81.37</low>
    <close>86.05</close>
    <volume multiplier="1000">811786</volume>
  </event>
</timeSeries>

Summary

What you have read in this section is a literate program which defines and describes the DTD and W3C XML Schema fragments required to handle a real-world problem. The code fragments in the macros appear within a human-readable context that quickly clarifies what those fragments do, why they are needed, and what their limitations are. Being able to view DTD fragments beside their equivalent Schema fragments makes it easy to compare the two approaches in detail.

Using xmLP

xmLP Goals & Approach

Having established the nature of a literate program, the way in which the “xmLP” tool supports LitProg can be described. Traditional LitProg tools provide the following:

  1. A complete (but tool-specific) markup language for literate programs, including both code and documentation sections;
  2. (Optionally) One or more code section markups for specific programming languages, or for other source file formats;
  3. The ability to “tangle” literate programs (assemble code sections) to generate the source files;
  4. The ability to “weave” literate programs into a final documentation format.

The advent of XML made it less attractive and less necessary to define and support custom markup languages for literate documents. To take advantage of XML, xmLP takes the following approach:

  1. Rather than define its own complete document markup, xmLP only defines a handful of XML elements which are intended to be used in conjunction with any suitable XML document markup, e.g. XHTML or DocBook;
  2. In order to keep xmLP from being constrained to a specific programming language, xmLP follows the traditional LitProg tools “FunnelWeb” and “noweb” (among others). These tools treat all code as text. This can be an advantage when you need to include a programming language or other file format that is not supported by your LitProg tool of choice. xmLP adds a slight extra — support for well-formed XML fragments. These make it easier to create correctly generate well-formed XML, by removing the risk of unmatched open or close tags;
  3. xmLP does not weave literate programs into a final documentation format as traditional LitProg tools do. Instead, the xmLP weaver adds just the contextual information needed to allow an XSLT stylesheet to format the code sections in the literate program.

Primarily, xmLP provides the business logic to deal with code macros, both for tangling and weaving. End users can concentrate on writing stylesheets that define the look and layout of their documentation, without having to worry about the semantics of macros and macro invocation, and without having to worry about building cross-reference information for the macros. These things are handled by xmLP.

Defining a macro

So, what does the xmLP markup look like? In the literate document from which this paper was woven, the macro corresponding to Figure 2 is actually written as

Figure 22
<lp:macro lp:usage="once" lp:final="true">
  <lp:name>DTD: decimal pseudo-definition</lp:name>
  <lp:text>
<!ENTITY % Decimal "#PCDATA">
</lp:text>
</lp:macro>

xmLP defines the following elements and attributes for creating macros. As mentioned previously, xmLP elements and attribute definitions are added to a DTD/Schema/etc. to make them available directly in the authoring of a document. A fragment DTD containing the complete list of xmLP elements and attributes is included in the appendix in Section 5.

lp:macro

element: Indicates the definition of an xmLP macro.

lp:usage

attribute: One of “never” or “once” (the default) or “multiple”. Used to indicate how often the macro is to be invoked (used). This proves to be a valuable quality-control (sanity checking) measure, and is taken from “FunnelWeb”.

lp:final

attribute: One of “true” (the default) or “false”. If true, only this macro definition can have the given “lp:name”. If false, all macro definitions with the same “lp:name” are concatenated in document order to fully define the macro. Once again, taken from “FunnelWeb”.

lp:name

element: The name of the macro being defined. In principle, the name may contain XML elements, so that MathML expressions and the like can be used in macro names. In practice, xmLP currently applies the XPath “normalize-space” function to the macro name to generate a simple text name that is then used to decide whether two macro definitions have the same macro name or not. This is not ideal, but sufficiently good for most purposes.

lp:text

element: Indicates a plain text component of the macro definition.

Invoking a macro

Invoking (calling or expanding) a macro is done with another xmLP element, “lp:invoke”. This element stands in place of the macro being called, and is entirely replaced by it during tangling (assembly of the generated source code files). Taking as an example the macro defined in Figures 3 and 6, this macro is written over two concatenated definitions (note that “lp:final” is set to false), and uses “lp:invoke” to insert the contents of the macro defined in Figure 5:

Figure 23
<lp:macro lp:usage="once" lp:final="false">
  <lp:name>DTD: financial elements</lp:name>
  <lp:text><lp:invoke>
  <lp:name>DTD: decimal pseudo-definition</lp:name>
</lp:invoke>
<!ELEMENT open  (%Decimal;)>
<!ELEMENT high  (%Decimal;)>
<!ELEMENT low   (%Decimal;)>
<!ELEMENT close (%Decimal;)>

  </lp:text>
</lp:macro>
Figure 24
<lp:macro lp:usage="once" lp:final="false">
  <lp:name>DTD: financial elements</lp:name>
  <lp:text><lp:invoke>
  <lp:name>DTD: integer pseudo-definitions</lp:name>
</lp:invoke>
<!ELEMENT volume (%NonNegativeInteger;)>
<!ATTLIST volume
  multiplier %PositiveInteger; "1">

  </lp:text>
</lp:macro>
lp:invoke

element: Invokes an xmLP macro by name, replacing the “lp:invoke” element completely with the macro contents.

XML macros

As well as plain text, xmLP macros can contain well-formed XML using “lp:xml”. As previously mentioned, well-formed XML fragments remove the risk of unmatched open or close tags. You can use plain text fragments (“lp:text”) to generate XML if you want, and it is sometimes useful to do so, but you take the risk of having unmatched tags in your generated XML source files. The following example corresponds to the output shown in Figure 1.

Figure 25
<lp:macro lp:usage="multiple" lp:final="true">
  <lp:name>Time Series Event Instance</lp:name>
  <lp:xml>
    <event date="2002-02-20">
      <open>85.70</open>
      <high>92.10</high>
      <low>81.37</low>
      <close>86.05</close>
      <volume multiplier="1000">811786</volume>
    </event>
  </lp:xml>
</lp:macro>
lp:xml

element: Indicates a well-formed XML component of the macro definition.

Defining output source files

xmLP uses the “lp:file” element to distinguish top-level macros that define output source files (there can only be one such macro for each output source file). These file macros have a file name rather than a macro name. Note that “lp:file” macros cannot be invoked by other macros. Namespaces are supported by xmLP using “lp:namespace”, as in the following example which corresponds to the output shown in Figure 18.

Figure 26
<lp:file lp:filename="src/timeseries.xsd">
  <lp:namespace lp:value="http://www.w3.org/2001/XMLSchema" lp:prefix="xsd"/>
  <lp:text>
<?xml version="1.0" encoding="utf-8"?>

</lp:text>
  <lp:xml>
    <xsd:schema><lp:invoke>
  <lp:name>W3C XML Schema: financial elements</lp:name>
</lp:invoke><lp:invoke>
  <lp:name>W3C XML Schema: event</lp:name>
</lp:invoke><lp:invoke>
  <lp:name>W3C XML Schema: timeSeries</lp:name>
</lp:invoke>
    </xsd:schema>
  </lp:xml>
</lp:file>
lp:file

element: Indicates the definition of an xmLP file macro.

lp:filename

attribute: The file name (or path) of the file being defined.

lp:namespace

element: Indicates that a namespace declaration should be added to the tangled XML.

lp:prefix

attribute: The namespace prefix to use.

lp:value

attribute: The namespace identifier (typically a URI).

To support W3C XML Schema, it is sometimes necessary to specify a Schema location using “lp:schemaLocation”, as in the following example which corresponds to the output shown in Figure 20.

Figure 27
<lp:file lp:filename="src/timeseries-schema.xml">
  <lp:schemaLocation lp:namespace="" lp:location="timeseries.xsd"/>
  <lp:text>
<?xml version="1.0" encoding="utf-8"?>

</lp:text>
  <lp:xml>
    <timeSeries><lp:invoke>
  <lp:name>Time Series Event Instance</lp:name>
</lp:invoke>
    </timeSeries>
  </lp:xml>
</lp:file>
lp:schemaLocation

element: Indicates that a (W3C XML) Schema location declaration should be added to the tangled XML.

lp:namespace

attribute: The namespace identifier (typically a URI). Can be empty.

lp:location

attribute: The Schema location URI.

Implementation of xmLP

The current implementation of xmLP (version 1.1) is written as 600 lines of XSLT (plus stylesheets for particular formats like XHTML). This may change in future implementations. A potential improvement to xmLP would be to introduce parameterized macros (in the manner of “FunnelWeb”), but it is yet to be decided whether this is best done in XSLT or in a more general purpose programming language.

Conclusion

This paper has introduced literate programming, and indeed this paper is a literate program itself. It has demonstrated how programs (and other source files) can be defined within the natural flow of a human-readable document, rather than in the flow defined by a compiler. It has also introduced a simple LitProg tool, xmLP, which can be used to turn any XML document into a literate document.

The source files for this paper will be available at

http://xmLP.sourceforge.net/2002/extreme/

Appendix — a fragment DTD for xmLP

This DTD fragment is non-normative.

Figure 28
<?xml version='1.0' encoding='UTF-8' ?>

<!-- PUBLIC "+//IDN xmLP.org//DTD Sample Module for xmLP//EN" -->

<!-- The name of an "xmLP" macro. -->
<!ELEMENT lp:name ANY>

<!-- An invocation of an "xmLP" macro. -->
<!ELEMENT lp:invoke (lp:name)>

<!-- Text within an "xmLP" macro. -->
<!ELEMENT lp:text (#PCDATA | lp:invoke)*>

<!-- Balanced XML within an "xmLP" macro. -->
<!ELEMENT lp:xml ANY>

<!-- An "xmLP" macro. -->
<!ELEMENT lp:macro (lp:name , lp:namespace* , (lp:text | lp:xml)*)>
<!ATTLIST lp:macro lp:usage  (never | once | multiple )  'once'
                   lp:final  (true | false )  'true' >

<!-- An "xmLP" namespace declaration. -->
<!ELEMENT lp:namespace EMPTY>
<!ATTLIST lp:namespace lp:prefix NMTOKEN  #REQUIRED
                       lp:value  CDATA    #REQUIRED >

<!-- An "xmLP" schemaLocation declaration. -->
<!ELEMENT lp:schemaLocation EMPTY>
<!ATTLIST lp:schemaLocation lp:namespace CDATA #REQUIRED
                            lp:location CDATA #REQUIRED >

<!-- An "xmLP" output file. -->
<!ELEMENT lp:file ((lp:namespace | lp:schemaLocation)* , (lp:text | lp:xml)*)>
<!ATTLIST lp:file lp:filename CDATA  #REQUIRED >

<!-- "xmLP" block elements. -->
<!ENTITY % lpBlock "lp:macro | lp:file">

Bibliography

[CWEB WWW] CWEB [LitProg tool], http://sunburn.stanford.edu/~knuth/cweb.html

[FunnelWeb WWW] FunnelWeb [LitProg tool], http://www.ross.net/funnelweb/

[FWEB WWW] FWEB [LitProg tool], http://w3.pppl.gov/~krommes/fweb_toc.html

[Knuth 92] “Literate Programming” by Donald Knuth, 1992, ISBN 0-937073-80-6, http://www-cs-faculty.stanford.edu/~knuth/lp.html

[Knuth WWW] Donald Knuth, http://www-cs-faculty.stanford.edu/~knuth/

[LP WWW] Literate Programming, http://www.literateprogramming.com/, http://www.loria.fr/services/tex/english/litte.html

[noweb WWW] noweb [LitProg tool], http://www.eecs.harvard.edu/~nr/noweb/#top

[SWEB WWW] SWEB [LitProg tool], http://tigger.uic.edu/~cmsmcq/tech/sweb/sweb.html

[xml-litprog-l] xml-litprog-l [mailing list], http://groups.yahoo.com/group/xml-litprog-l/

[xmLP WWW] xmLP [LitProg tool], http://xmLP.sourceforge.net/



xmLP — a Literate Programming Tool for XML & Text

Anthony B. Coates [Financial XML Specialist]
abcoates@TheOffice.net
Zarella Rendon [XML Factor & W3C XSL WG]
zarella@xml-factor.com