Further development of OpenJade

Javier Farreres
farreres@lsi.upc.es
Cristian Tornador

Abstract

This paper explains plans for the immediate development of the OpenSP/OpenJade tool-set, its objectives and philosophical approach, and the partial results already obtained. Four works in progress are explained: extension of the style language towards complete coverage of page and column models, and plans to include bidirectional communication between formatter and style engine; full coverage of the grove model for the SGML propset; completion of the query feature together with the full implementation of the transformation language; and integration of OpenJade into the Apache project. Finally, one application of OpenJade being developed is presented, and future work is pointed out.

Keywords: Processing; Open Source

Javier Farreres

University Professor and SGML/XML consultant for publishing companies

Cristian Tornador

Student

Further development of OpenJade

Javier Farreres [Politecnic University of Barcelona (UPC), Department of Computer Languages and Systems]
Cristian Tornador [UPC]

Extreme Markup Languages 2002® (Montréal, Québec)

Copyright © 2002 Javier Farreres and Cristian Tornador. Reproduced with permission.

Introduction

OpenJade [OJ] is the evolution of Jade [JADE], the tool first developed by James Clark which implements part of the ISO DSSSL [Document Style Semantics and Specification Language] standard [DSSSL]. Along with DSSSLprint and NEXTPublisher [next] by Next Solution, OpenJade is one of the few implementations of this standard, and it is the tool used by the DocBook project [DBK] for processing the DSSSL style-sheets.

DSSSL is a standard related to SGML [sgml]. It specifies a programming language based on Scheme [scheme], a functional language, and a processing model divided in several layers. Any SGML document is analyzed, and a data-structure named grove is created that represents the document. The standard gives a complete description of the grove data structure and provides ways to use only portions of the structure for particular needs by means of grove plans. With the transformation language, this grove can be modified in every way, and with the style language the grove can be given a suitable external form. There is also a query language, used both by the transformation language and the style language. The text of a DSSSL document is written using SGML notation.

Even though DSSSL is a powerful standard for marked text processing, the cause for its low use is the lack of tools supporting it. And the OpenJade Project has been on a pretty low level, due to the general shift towards XSL. For this reason, in order to put DSSSL in the place it deserves, OpenJade must be extended and completed.

Although it is too soon to talk about it now, complete implementation of the DSSSL standard is only a step towards implementation of the HyTime [HyT] standard.

As explained by Eliot Kimber [Kim93]:

HyTime is an ISO standard that defines an architecture for creating hypertext and hypermedia applications. HyTime primarily addresses the problem of hyperlinking as a problem of addressing, in other words, locating objects in space or time. A key aspect of addressing is the use of queries to find things based on their properties.

SGML, being as complicated as it is, is even too simple for problems found in the publishing area. Perhaps HyTime is the solution, or only one more step towards it. Even though there is an implementation of the HyTime standard, GroveMinder [grmi], it doesn’t use or understand DSSSL. HyTime processing would be easily implemented on top of a complete DSSSL engine, and in fact, some parts of DSSSL are referenced in the HyTime standard, like the query language.

In its former state, before this extension was started, OpenJade had a proprietary transformation language. It didn’t have the ISO DSSSL standard transformation language, however. The grove model supported was very limited. The query language was almost completely implemented, and the style language didn’t support the more complex flow objects and features.

It can be deduced from above that during the development of the initial Jade, stress was put on the style part, looking for a fast spread of the most useful part of DSSSL, the formatting of documents. A minimum of the other parts were implemented — just the ones which were mandatory.

After James Clark dropped development, OpenJade was created as a project by SourceForge [SF], and for some time its development progressed slowly.

A few months ago, the work was resumed with the aim to fully implement the DSSSL standard into OpenJade as a first stage, and then continue towards implementation of modules to process XSLT, and towards implementation of the HyTime ISO standard.

At the start of this extension, various discussions took place on the DSSSL distribution list [DSL] and the OpenJadeDevel distribution list [OJDL], in order to evaluate the different parts that were more important. It was evident after the discussions that a page model was a priority due to its immediate utility, a groves model was a priority for the next step of jumping to the HyTime standard, and a transformation language was necessary to implement the XSLT extension over it. Apart from these points that are part of the DSSSL standard, it was later pointed out that the inclusion of OpenJade as a module in the Apache project would be a good enhancement and would definitely help the spread of the standard for processing SGML and XML documents. There are, of course, other parts in OpenJade waiting to be enhanced, and they will be pointed out at the end of the paper.

This paper shows the many ideas that appeared during the discussions that took place in the distribution lists and the work done on the different parts of OpenJade up to the moment.

Our approach when writing this paper was, with intention, to avoid argument and comparison with similar XML standards. There is a reason for it: whereas XSLT is working, DSSSL is not working right now. It is our concern that DSSSL development has been forgotten for too long, although we consider it to be a very powerful and well-thought standard. Not referencing XML standards was a deliberate decision and not an oversight.

Objectives

The most critical problem was defining the general mission or goal of OpenJade’s extension.

All agreed that it would be desirable for someone to do more work on the OpenSP-OpenJade toolkit. Most correspondents seemed to favor a significant re-architecting of the underlying software. In effect, SP and Jade would become parts of a modular toolkit known as OpenJade. The project and the Modular OpenJade Object Library target product would effectively merit increasing the edition number from 1 to 2 (hence OJ2). However, statements were also made indicating support for evolutionary extension of the current code base.

There were thus two goals: Extension and Revision, and Rearchitecting, and both would be more or less developed together.

The main effort would be done by students who, having finished their studies in computer science, need to develop a final project in order to obtain their degrees. The flow of students is more or less constant, and they are very motivated by a project of the kind OpenJade is offering.

Philosophy

The DSSSL standard was finally published in 1996, that is, six years ago. Many hopes were put in it, but tools for its processing have not appeared since then. Only partial implementations of some parts of the standard exist. It is time for completing what was started by James Clark. Only when some working product exists that can support the standard, will it be possible to evaluate its limitations and potentialities. The hope is that in a year or so, the many parts that are pending in OpenJade will be finished and available. At the time Extreme2002 will take place, only part of the extensions will be available, but it will be possible to test some of them.

The definition of the DSSSL standard offers a set of primitives which are enough for complete processing of any kind. There are also a quantity of derived procedures destined to optimize the processing. But it must not be seen as the largest set of derived procedures that could be defined. For example, there is a part of the query language which is defined specifically for SGML processing, but taking only into consideration the default grove plan. More standard procedures could be added to deal with groves generated from alternate SGML plans so that people don’t invent different ways of doing other common tasks. It is easy to see that, if the DSSSL standard were revised, many new standard procedures could be defined to process more efficiently groves generated by alternate plans.

For this reason, the purist approach of only implementing the DSSSL standard seems a bit limiting. But, as the idea is to finally have a working product in the least time possible, it seems quite adequate to implement just the DSSSL definition as a first short term objective, and then later extend its capabilities.

ISO standards are reviewed every 5 years for renewal, revision, or cancellation. With the lack of activity on DSSSL implementations, we are concerned that it could lose its status as an active ISO standard. The effort being made might behoove others (especially non-coders) to get involved with their national standards bodies as they review the standard.

Management issues

At present there are four active OpenJade projects being developed by some of the authors, who are students developing their final projects, and whose partial results are shown in this paper. The projects pose interesting problems, more in terms of efficiently implementing standard behavior than in theoretical terms.

Jade was initially written in C++. There is no point in rewriting it. But it is considered important to study its class design. The aim is to redefine Jade as a modular tool. This aims for a good encapsulation of classes. Now there are two tools working together: OpenSp and OpenJade. There are two possible directions. One is to fuse them all in one tool. The other is to split them into smaller tools (OpenSp, OpenGrover, OpenTransf, OpenStyler, etc.) with different modules interchanging objects via some Corba [corba] implementation. This approach has already been tested by J. Reynolds and Eliot Kimber [Reynolds&Kimber 2002] successfully.

This modularization would allow parts of OpenJade to be used independently. For example, some SGML editor could use OpenSp and OpenGrover to generate a grove structure from the text being written. This asks for a good documentation of the classes and methods, and for this reason the Open Jade Internals document [OJID] will be extended as a result of the work. UML [UML] will be used to represent class diagrams, it being a standard now for Object Oriented Design.

Although rewriting the code is not being considered, reuse of existing freeware Scheme implementations is considered a priority for the expression language, as it would allow the storage of precompiled style-sheets, thus enhancing processing speed. Also, when design problems would be detected via UML class diagrams, some partial reorganization and rewriting would be necessary in order to rearrange the design. All modifications will be directed towards the encapsulation of classes.

Anyway, OpenJade continues under the paradigm of open source development, and non-student volunteers are always welcome. Those interested in offering their collaboration should indicate their interest in some part of the pending development, and they will either be assigned responsibility for the part, or be redirected to the responsible person.

In order to extend OpenJade, a kind of procedure has been established. OpenJade is a tool being developed under the open source paradigm, and this usually involves the coordinated work of many individuals. In this scenario, documentation is a must. A first aim taken in the extension is to correct the lack of documentation in the OpenJade project. UML diagrams will be created that represent the OpenJade’s present code for all the parts being extended, and later the diagrams will be modified to adapt the extensions. This process, in UML terms, is named reverse engineering. It will also help detect the possible defects in the classes design, which can later be solved. Related to the documentation problem, it must be remembered that one of the aims in this extension work is to allow the use of separate modules of OpenJade, and that calls for good class design, isolation, and encapsulation. From this point of view, the addition of code is not useful if it is not documented. All future developments should be reflected in the documentation, with the modification of the appropriate diagrams.

Style Language

Limitations of the current system

The feature that is most often complained about is OpenJade only implementing a subset of the DSSSL page layout stuff. So, full-on page layout, at least, is the most often requested feature. The fact that the page feature is not fully implemented pretty much leaves Jade as a toy.

Another important feature is bidirectional communication between the formatter and the style-engine, which would allow conditional formatting, that depends on the format of other parts of a document, like dictionary heads.

OpenJade lacks implementation of the query construction rules, which require the optional query feature. With these construction rules, a style rule can be specified using the complete capabilities of the query language.

Goals

In the publishing area, the page feature is of uttermost interest. Definitely, high-end publishing requires complex page layout and sequencing. Experience indicates the style language is capable of expressing all the formatting needs encountered today.

Bidirectional communication is also very important, and it is highly tied with complex page formats. The importance of bidirectional communication is required for high end formatting. Top of the page index words in dictionaries are an example of this feature. These words depend on the elements that are finally placed on the page, and this makes it ask for some way to query which flow objects are placed on a page. It must not be confused with the bidi feature, which allows bidirectional text flow (mixing of right-to-left text with left-to-right).

Query construction rules would also be a powerful tool for writing style-sheets.

Development aspects

The architecture of OpenJade closely follows the structure of the DSSSL standard: the only purpose of the style language is to specify a FOT [Flow Object Tree], and that is what OpenJade does. Then, various back-ends convert the abstract FOT into typesetting code in different languages, e.g., RTF, TeX, MIF, etc.

Getting this information into the Flow Object Tree isn’t the difficult part — rather, it seems the problem is in the back-end to actually do the page layout based on the FOT. For the separate PDF back-end, it would be possible although most challenging; but there are doubts it would be possible at all for RTF, plain text, etc. Only the TeX-engine could certainly cope with it easily.

If bidirectional communication between formatter and style-engine were implemented, it would also allow for very lightweight back-end formatters, since it would incorporate parts of the formatting into the OpenJade engine. This would make it easier to write new back-ends for other external formats.

Results

The extension of the style language is giving its first results, but only in its preliminary state of documentation. OpenJade’s code has been inspected, and UML diagrams for the style part of the tool have been created. They are available on the web for review [diag]. At this moment, the work is at the stage of adding new flow object classes and hanging them to the FOT, in parallel with the work on the TeX back-end to generate a format that corresponds to that specified.

Complete groves model

Limitations of the current system

The grove model as specified in the DSSSL and HyTime standards has four components: modules, classes, properties, and data types. Each module defines classes, and the classes are composed by properties of a data type. Sometimes there are modules which introduce new properties for classes not defined in the module. The designer can specify a grove plan, indicating the inclusion of some modules; the grove created will follow this plan and will only include the classes and properties defined by the included modules. Modules have also an inclusion relation, which means that including a module causes the additional inclusion of those modules that include it. By default, there is a set of modules which are always included, and they form the default grove plan, which is used if no alternate plan is specified.

OpenJade has only limited support for the grove model. It has no grove plan handler, and only the default grove plan was supported.

Goals

There was a consensus in the discussions on the distribution lists that Groves should be explicitly made the centerpiece of OJ2. The Grove engine should go in a separate module. The module itself should be able to stand alone as a tool or toolkit library that could be used by third-parties. The grove engine is expected to be the most important part of the OJ2 product.

Full grove implementation is essential for adding any HyTime addressing and linking capabilities. It would also allow other possibilities, such as automatic DTD analysis and transformation.

By allowing additional front-end notation processors to supply input groves, OpenJade could become a general-purpose transforming wizard. For instance, RTF [Rich Text Format] documents could be delivered as groves to a DSSSL transformation. (GroveMinder has the capability to process other notations to create groves, but does not use DSSSL for transformation.)

Flexible grove plans would allow creation of “lite” groves that could enhance performance in some situations.

The conception of groves is what finally led to the breakthrough that unified DSSSL and HyTime. So, for better or for worse, any implementation of DSSSL or HyTime must have full grove-processing capabilities.

The grove module of OpenJade should be used in many ways outside OpenJade. The Grove View [grvi] product already made use of the grove library from Jade to show in a graphical way the structure of the grove for a concrete SGML document, but this library wasn’t thought to be used this way. Now, the grove module is going to be designed with this distributed usability in mind. Thus, a text editor could implement its text on top of this structure. And a database could be modeled to accept documents represented by groves that the OpenJade grove module creates.

Development aspects

Adding new node classes and properties to OpenJade shouldn’t pose a problem since it already has a part implemented and they can just be mimicked. The problem perhaps will come with the grove plan handler, the part which has to decide which view of the complete grove should be constructed according to the grove plan specified.

Results

At the moment, the work on the grove part is in the stage of code analysis and UML diagrams generation. UML diagrams for this part are expected really soon.

After this work is finished, classes for the complete nodes of the SGML grove plan will be created, and a grove plan handler will be added to OpenJade to allow the style-sheet designers to indicate views of the complete grove plan by specifying a grove plan.

Transformation language

Limitations of the current system

There is absolutely no transformation language in OpenJade in the terms defined in the DSSSL standard. Some extensions to the style language were implemented which helped SGML-to-SGML document transformation. This work is intended to deprecate (not obsolete) these ad-hoc extensions and focus on the transformation language as specified in the text of the standard.

Goals

Transformation language could be used to implement style language solutions, too. That is, implementation of style language solutions might be easier to write and more elegant if the coders could use transformation language tools. The transformation language could be used to preprocess SGML files to improve their structure for formatting

The consensus has been toward a very Grove-centric approach and Grove-centric solution for OpenJade. This implies that the project will need a grove-aware query language. SDQL is designed for this function.

The transformation language would have the potential to support many wonderfully useful applications. But the market has not demonstrated a need for this sort of thing. The fact is, for any “particular” SGML transformation, there are at least half a dozen “good” ways to accomplish it, and XSLT is quickly becoming about the easiest way to specify such transformations. Developers do not hesitate to string together a long sequence of preprocessing and/or postprocessing steps, with XML+XSLT in the middle. Two or three other scripting languages may be involved in the process, such as ASP, SQL, etc., and neither the developers nor the system architects are bothered by polyglot solutions. Companies big and small are offering integrated, but inevitably proprietary and non-standards-based, solutions to these problems.

A DSSSL+HyTime grove-based transformation system would drastically simplify these sorts of applications and raise the capability of “single-source publishing” to an entirely new level.

DSSSL is “implicitly” a standard for a family of HyTime applications. The philosophy of the transformation language as explained in the standard is not just to limit the work to SGML-to-SGML transformations, but grove-to-grove transformations. In a HyTime concept of a document, SGML is not everything. A grove can contain the data from a graphic or sound file, and it can be queried in the same terms as SGML documents by means of the query language.

But the full model is SGML-to-grove-to-grove(s)-to-whatever, with the transformation language doing the grove-to-grove(s) part. What users do with the output grove is only limited by their imagination (and programming skills!). This provides a standard, universal model for practically all data and document transformation needs, using a single consistent data model and processing paradigm.

Development aspects

The transformation process is specified by a collection of associations. Each association begins with a query expression from the query language. Almost the whole query language is implemented in OpenJade, but the few optional parts that are not implemented could be finished before starting with the transformation language, in order to have the full capabilities of the query language at reach. Anyway, the lack of these optional features doesn’t prevent implementation and use of the transformation language.

DSSSL transformation looks over-complicated, but only because the DSSSL transformation language is, conceptually, very sophisticated and challenging. Nothing less would be so powerful. Unfortunately, to understand it fully requires: a) good grasp of functional programming methods, especially recursion on tree structures; and b) an understanding of the abstract structure of documents according to the grove model. Very few people, apparently, have those prerequisite skills. This could be corrected by good teaching materials.

In order to see the immediate utility of the transformation language, some transformations in the style of DocBook2HTML will be created.

The built-in back-ends for Jade made its implementation of the style language immediately useful. Some additional built-in back-ends for the transformation language would be essential. At minimum, the language should be able to emit SGML documents (including declarations and DTDs) and canonical grove representation. Other possibilities would be STEP Express instances and CGM files. Some preliminary work on grove plans for both of these notations was probably done a few years ago, but it is possibly very difficult to recover or resuscitate these efforts.

Results

The development state of this part is only in the preliminary stage. Extension of the query language shouldn’t pose any problem, as there is little left to do. But implementation of the transformation language will require implementation from scratch. However, although looking complicated, the transformation language specification is quite short, and it has very few constructs. Perhaps it will take less time than it would appear at first sight.

Apache integration

Limitations of the current system

Anything that would make OpenJade more directly usable as the back-end of an http server would make it more attractive to a wider market. Some useful features would be: persistent grove representation, the ability to invoke in-memory “compiled” transformations, and a database-to-grove transformation component.

If, in the best of all possible worlds, OpenJade came with a mini-http server built in, users could have an instant SGML server that would eliminate the need for “any” transformation or preprocessing of SGML source data for web publishing. It would support HTML viewing or on-demand composition and delivery of PDF files from the same SGML source. With additional front-end notation processors, it could be a complete “information server”, as envisioned (and partially implemented) by GroveMinder.

Goals

The aim for this part of the work is to integrate OpenJade into the Apache project as a post filter. It would allow transmission of SGML files over the web and processing of SGML documents on the fly. In order to speed up the processes involved, when this integration is effective, optimizations will be developed so that style-sheets and DTDs can be compiled and stored, rather than interpreted each time.

Development aspects

In order to develop this part, solutions for the integration of the XSLT post filter into Apache will be studied and compared. It is believed that some of the efforts made for XSLT should help in this task.

Results

The state of this part is very preliminary. Implementation of the XSLT post filter is being studied and evaluated to see if it can help some way for the same task with OpenJade into Apache.

An application of OpenJade: book e-server

Another group of three students is developing a concrete application of OpenJade.

Following the philosophy of Grove View [grvi], this project plans to develop a database to store documents marked up in SGML. The grove model will be used. The idea is to have OpenJade as an SGML parser and grove constructor. Afterward, a database will be fed with the grove created by OpenJade, and the grove structure will be inserted in a relational data model. This will work in cooperation with a web page. The final operation is expected to allow a user to insert any document related to any DTD. Copies of books could then be ordered, and formatting parameters would be specified by the user. The tool will also register book authorships and requests together with client data.

Industrial and user community benefits

Completing the OpenJade tool to include the full DSSSL specification is going to be a very important contribution to the community and the industry.

As explained above, many companies start to develop applications, and when some problem is encountered, a partial solution is applied in terms of another language or technology. This way, the promised benefits of adopting a global technology are not met, and multiple partial solutions must be sought in order to solve problems.

The best thing DSSSL offers is a consistent processing model, rather than a large number of disgregated solutions. The DSSSL language defines a data model for the representation of SGML documents, and it contains in ONE language all the capabilities for querying, transforming, and applying style to SGML documents. And it is a proper programming language.

When modularization of OpenJade is finished, it will be possible to use parts of OpenJade for other tools. The grove module will accessed and used to construct groves for other tools, such as SGML editors or SGML databases. One example of this is the book e-server shown above.

Parts pending work

The SGML parser, which in the OpenJade tool is a separate part called SP, will have some additions such as W3C extensions, XML conformance (Well formed check), XML namespaces, and XML-Schema.

The current OpenJade implementation includes a lot of one-time code for things such as string-manipulation and Scheme interpretation. This “eccentric” code should — whenever possible — be replaced by more standard implementations. For example, any string manipulation should use C++ STL strings whenever possible. Likewise, the Scheme interpreter should be “borrowed” from an existing Freeware implementation of Scheme.


Acknowledgments

Thanks to Trent C. Shipley for compiling the minutes of the discussions on the OpenJade-devel and DSSSL distribution lists [minu], which were used as a skeleton for this paper’s first drafts. Thanks also to Paul Tyson for allowing me to use his words, for reviewing the various drafts which were sent to him, and for his accurate suggestions.


Bibliography

[corba] Corba: http://www.corba.org/

[DBK] DocBook project: http://www.docbook.org/

[diag] UML diagrams for Jade, available at http://openjade.sourceforge.net/documentation

[DSL] DSSSL distribution list: http://www.mulberrytech.com/dsssl/dssslist/index.html

[DSSSL] ISO/IEC 10179:1996, Document Style Semantics and Specification Language (DSSSL)

[grmi] Grove Minder: http://epremis.com/

[grvi] Grove View: http://www.isogen.com/demos/groveview.html

[HyT] ISO/IEC 10744, April 1995. Available at ftp://infosrv1.ctd.ornl.gov/pub/sgml/WG8/HyTime/TC

[JADE] Jade: http://www.jclark.com/jade/

[Kim93] W. Eliot Kimber. HyTime and SGML: Understanding the HyTime HyQ query language 1.1. Technical report, IBM, August 1993. Available at ftp://ftp.ifi.uio.no/pub/SGML/HyTime/HyQ-1.1.Kimber.

[minu] Minutes of the OpenJade discussions, Trent Shipley, http://lists.sourceforge.net/lists/listinfo/openjade-devel

[next] Next Solution http://www.nextsolution.co.jp/

[OJ] OpenJade: http://sourceforge.net/projects/openjade/

[OJDL] OpenJade-devel list: http://lists.sourceforge.net/lists/listinfo/openjade-devel

[OJID] Matthias Clasen, OpenJade Internals, http://openjade.sourceforge.net/internals/t1.html

[Reynolds&Kimber 2002] J. Reynolds and W. Elliot Kimber. “XML in a Distributed World: Exposing DOM/Groves Through CORBA”. XML Europe 2002, Barcelona, Spain.

[scheme] IEEE Standard 1178-1990, IEEE Standard for the Scheme Programming Language, published by IEEE in 1991. ISBN 1-55937-125-0

[SF] SourceForge: http://sourceforge.net/

[sgml] ISO 8879:1986, Information processing — Text and office systems — Standard Generalized Markup Language (SGML)

[UML] http://www.omg.org/uml/



Further development of OpenJade

Javier Farreres [Politecnic University of Barcelona (UPC), Department of Computer Languages and Systems]
farreres@lsi.upc.es
Cristian Tornador [UPC]