Efficiency structured XML (esXML): XML without most processing overhead: Thousands of messages per second with a more straightforward, yet more sophisticated application coding

Stephen D. Williams


esXML is a new portable structure that retains all XML features, avoids parsing, and minimizes overhead in consuming, traversing, modifying, and producing esXML. New semantics include efficient pointers, copy-on-write layering of changes to a base document, and direct representation of binary content, such as images. esDOM, the esXML API, is similar in concept to a collection interface while also managing extensible intermediation between an application and business object data. Implementing a type of zero-copy processing while supporting 4GL-like semantics in 3GL languages, esXML and esDOM are designed to improve common application development and accelerate processing of XML data.

Keywords: DOM; Markup Languages

Stephen D. Williams

Stephen Williams is an IT system and network security professional with 20 years of experience. Expertise includes system and network architecture design, security analysis and implementation, project management, advanced transaction and database systems, clusters, PKI, real-time and offline video processing, and rule engines. Security projects have included the first Bank of America firewall and web DMZ, secure non-repudiation of large-scale text and video messaging at America Online, numerous network security design and implementations, grid computing security analysis, and lead architect and management of a highly-sensitive secure government payment system employing PKI smart cards, mobile Java applications, and signed and encrypted XML documents. Mr. Williams has also provided analysis of smart cards, biometrics, and location technologies, such as GPS, in support of government overview of sensitive issues, such as Internet Gaming. Analysis, planning, and recommendations for the use of PKI and improved security architecture were also provided to the US Department of Justice. Developer of the first versions of AOL’s BuddyList and creator of AOL Instant Images, Mr. Williams has also participated in IETF standards efforts including the Instant Messaging Presence Protocol working group. Key areas of interest have included: many XML related methods and extensions to standards and paradigms, biometric research and presentation to the Nevada Gaming Commission, creation of satellite-based communication software, cryptographically secure digital notary and secure transaction systems, architecting advanced information processing grid systems, creating expert systems and rule engines, and designing federated communications technologies and architectures.

Efficiency structured XML (esXML): XML without most processing overhead

Thousands of messages per second with a more straightforward, yet more sophisticated application coding

Stephen D. Williams [Senior Technical Director; High Performance Technologies, Inc.]

Extreme Markup Languages 2003® (Montréal, Québec)

Copyright © 2003 Stephen D. Williams. Reproduced with permission.


Tired of writing endless getters and setters to support multiple tiers and sometimes small amounts of application logic? After optimizing your application logic, are you left with a too-slow system because of the processing overhead of XML or other protocols and data formats? Is XML development too cumbersome? esDOM and esXML are presented here, providing: an improved API, a new efficiency structured binary infoset that avoids parsing and other overhead, and a number of new useful semantics.

The goal of this project is to greatly improve many kinds of application development by extending the use of XML in ways that are simpler to program, much more efficient, and the addition of useful new semantics. There is both a programmer-facing component, the esDOM API, and the back end storage and network facing layers, esXML. esDOM is patterned as a high-level collection library that provides improved access to DOM-style storage in a way that is a direct analogue to traditional object oriented language object access. By supporting concepts, such as scoped reference objects, this data structure can be used to provide the member data for native language classes in a seamless way.

Secondary goals include compression and the ability to express binary data directly; however, unlike other ‘binary’ related projects, these are entirely secondary except for avoiding base64 encoding of raw blocks of data, such as an image.

esXML history

Table 1
19980901 First Designs
20000601 bsXML prototype implemented for MPEG4 scene graphs
20011201 SDOM design finalized for secure application using signed XML document objects
20021101 Initial rewrite of paper
20030415 Significant content, background written
20030605 References, esXML, esDOM details added

Extended abstract

Progress has been made in the sophistication of data models used for communication between applications, storage systems, and communication links. This sophistication, which has significant benefit, has often incurred additional programming requirements and loss of efficiency. This paper proposes an overall strategy and outlines a specific design that retains the gains of modern data management methods while simplifying application development and greatly enhancing efficiency.

XML has served as a regular, minimalist, and sufficient basis for widespread standardization, interoperability, and innovation. Simultaneously, processing models have evolved that allow applications to consume and create XML as part of distributed application development. The componentization and distributed services models create many tiers and data/communications transitions. Increasingly, interface layers, data parsing and conversion, and memory management have caused a large amount of processing overhead. This overhead tends to cancel the benefits of XML and leads to unacceptable performance issues in complex systems. Although the textual nature of XML exacerbates the problem, most communication interfaces suffer from a similar overhead.

Existing interprocess communication and data storage methods generally require data parsing, conversion, and serialization steps as data moves between in-memory formats and serialized network and storage formats and protocols. In many cases, significant code is written to perform these conversions, frequently dominating application development and the resulting processing. These issues are more pronounced with XML processing. The ideas of standardized format and regular semantics can be extended by the creation of a data format that supports XML semantics, is portable, has the same wire and memory format, and supports in-place modification.

Efficiency Structured XML retains all XML 1.0+ features while proposing a new structure. This structure is portable, directly supports useful new semantics, and is extremely efficient to the point of avoiding nearly all overhead in parsing, consuming, traversing, modifying, and producing esXML. XML and esXML can be cross converted with no loss of data and it is envisioned that XML libraries would support both. New semantics include highly efficient pointers, copy-on-write layering of arbitrary changes to a base document/object, and direct representation of binary content, such as images. The proposed API for manipulation of an esXML object is called esDOM. esDOM is a simplified API to the esXML document/object, similar in concept to the C++ Standard Templates Library or other collection interfaces. esDOM is also a sophisticated library that manages extensible intermediation between an application and business object data. Implementing a type of zero-copy processing while supporting 4GL-like intermediation semantics in 3GL languages, esXML/esDOM is designed to radically improve common application development and accelerate processing.

The state of XML

What’s right with XML

XML in many ways has caused a revolution in data storage habits, standards, tools, and development. Much has been written about the scope, compromises, and expected usage. Often mentioned benefits include textual basis, support for both document expression and object data, support for Unicode, simplicity, readability, portability and parseability, similarity to HTML and SGML, and related standards that cover validation, paths, and common industry data formats. The key benefits of XML derive not only from the extensibility, simplicity, and standardization, but more importantly from the idioms that are expected to be adhered to when possible. These idioms create an expectation of more sophisticated application processing of data. Some of these perceived idioms include tolerance of change and extensions, expression of values as text formatted in standard ways, and data-driven validation rather than embedded code.

What’s wrong with XML?

Recognizing that XML is a compromise that is not appropriate for everything, there are still improvements that could be made that cover many types of usage without losing most benefits. With improvements, the range of appropriate applications can actually increase significantly. While it will not make sense to build low-level infrastructure and specialized data structures, such as TCP/IP or image bitmaps using XML, high level messages, transactions, and nearly all enterprise data and business objects should be expressed, stored, and processed in a format like XML.

The main problems with XML are often related to strengths: the matched tag text basis that is too verbose, must be parsed causing many performance issues, missing features to support arbitrary data structure analogues including pointer-like references, delta, and difference ability. Additionally, binary data and arbitrary text which might contain markup characters must be text encoded, there isn’t predefined typing, and it can be problematic to arbitrarily nest XML documents.

Application vs. infrastructure development

An older division in software development concerns was between System programmers and Application Programmers. I have long argued that C++ applications developers should have focused on a shallow usage of C++ classes while “system programmers” developing libraries like STL would do the heavy lifting. The uncertainty of that division and the lack of a library like STL unnecessarily frustrated application developers. The increased sophistication of modern N-tier applications has made this problem worse. We have also graduated to more subtle divisions of development that relate to our desire to layer, standardize, simplify, and in general reach the promised land of code reuse and componetized assembly. Modern programmers focus on areas including: presentation, business rules, controller logic, business objects and documents, infrastructure data, security, data access / database, communication / network access methods, application and transaction servers, distributed processing, and frameworks and libraries to support many of these areas. It could be argued that there is reason to divide these areas between those that are business oriented and, therefore, likely to evolve over time and have many external interface needs (“Business”) and those those that are infrastructure. Infrastructure could be further divided into fixed, low-level, limited complexity support (“Low-Level-Infrastructure”), such as TCP/IP, and layers that are much more complex, likely to evolve in large ways, and those that involve many competing technology bases (“High-Level-Infrastructure”). It is generally agreed that Low-Level-Infrastructure should normally be implemented by highly optimized hand-coding for specific tasks. Conversely, for both Business and High-Level-Infrastructure, changes and improvements in requirements, data elements, interfaces, compatibility, and security cause continual updates to data structures and everything influenced by them. Many traditional methods do not make this constant change efficient because any change must be reflected in modifications to many areas and tiers of a system and awkward management of old versions.


Each programming language has its own strengths and weaknesses with regard to data structures and native I/O. Application developers are expected to express all data structures with native idioms. This makes sense for self-contained applications where most of the work is internal and possibly complex. Modern N-tier Enterprise applications require numerous network and component interfaces, coordination methods, and standard interchange formats. In many types of applications, this communication (and related processing) overhead can dwarf the actual work of the application. This effect continues to become more pronounced as layering, sophistication, and componentization increases. As monolithic application architecture has progressed to integral network connectivity and component frameworks, methods to transform internal data structures to external, “wire” formats and back were developed. This process, referred to as “serializing”, “marshalling”, “flattening”, and “pickling”, packages data into arrays of bytes for storage or transfer using some convention. These conventions were often custom developed but as the need to avoid development and debugging issues increased, attempts at standardization lead to numerous alternatives. In considering these alternatives, efficiency and impact on programming complexity should be considered. In addition to the wire data model and related preparation and coding, overall efficiency is a very important measure, as is architecture portability, flexible communication methods, and scalability.

Wire data model and related preparation and coding

Early network programs simply copied the memory format of data elements to the network, possibly after packing into buffers. Eventually, the concept of “network byte order” was added to standardize format of binary scalar data when architectures differed only in byte order. The first network communications libraries, such as ONC-RPC, simply provided methods that encoded basic types into buffers in network byte order. Striving to both document these interfaces and to simplify creation of the tedious code needed, preferrably in a programming language independant way, IDLs [Interface Definition Languages] were created. These libraries and frameworks include ISO ASN.1 (along with encoding formats, such as BER and DER), DCE, MS DCOM, CORBA (which finally matured to include a standardized encoding format in IIOP), and now .NET Remoting. IDL inevitably suffers from being limited to the least common denominator features of each target language and definition that doesn’t match the data definitions in each language. While an IDL compiler produces stubs for each endpoint program, it is still up to the programmer to work in native language structures and perform conversions in code added to those stubs. Java RMI, being exclusively for Java endpoints, was able to avoid much of this gap and since Java ostensibly runs everywhere there is potentially no architecture gap. Unfortunately, the modern programming environment is usually not so uniform and RMI suffers from other issues, such as being solidly tied to the RPC paradigm. In addition to development preparation and work required to work with IDL, IDL generally creates fixed code that expects predetermined structure. Many IDL-based systems support change only through creation of additional interfaces (lookupv1, lookupv2, etc.) New stubs must be populated and versions must be carefully managed.

The success of protocols like SMTP, POP/IMAP, FTP, and especially HTTP has illustrated brilliantly the benefits of thin infrastructure, text-basis, straightforwardness, and open development. Based on this success and the success of XML as a standard that many can accept as a good compromise, new programmatic network communication standards, such as vXML-RPC, SOAP, REST, .NET, and BEEP have been created. From the point of view of tools, possible model simplicity, extensibility, and expression of documentation and definition in formalism directly related to the actual protocol, these approaches are gaining popularity. The most significant issues with these approaches are performance and communication model.

Data and method discovery — DCOM, SOAP, .NET, ODBC/JDBC

Applications can be built with explicit code for network communication interfaces or an interpretive discovery interface can be used. In many cases, discovery interfaces are occasionally used by development tools or reporting tools that can explore services and data sources. Additionally, scripting languages can dynamically add features or access to services using discovery. Other interfaces like database access libraries, such as ODBC/JDBC and the proprietary drivers they rely on provide an interface where IDL-like code generation is usually not used. Applications are written using a high-level interface that flexibly interprets numbers of and types of columns output in the restricted communication paradigm supported by traditional databases. This allows applications to support both expected reponses with fixed code and interpretive processing, such as a flexible reporting window using the same interface.

The ideal resource and service discovery mechanism will take a while to develop, but there are a number of increasingly popular efforts. ODBC (and the Java version JDBC) are examples of a standardized API without a standardized protocol, even when the protocol is eminently straightforward. There are many problems with this situation, likely to be resolved by the use of data access services instead of SQL-based direct ODBC/JDBC.


Efficiency isn’t intuitive to many programmers and poorly designed and written prototypes that seem fast can be overwhelmed by production loads. While faulty application algorithms can be replaced, a poor infrastructure that has taken significant work to integrate with can be a severe burden. Efficient methods and optimization strategies are relatively well understood by some in the industry, but much of this knowledge has not been incorporated into the architecture and design of network communication methods. Additionally, modern methods have shifted away from low-level efficiency and toward programmer, standard tool, and archival efficiency. Ideally, the use of more open and sophisticated techniques could also benefit from highly efficient computational characteristics.


Nearly all communications over a network involves conversion between memory and wire formats in some way. While storage in disk and tape files is similar to formatting needed for wire communication, most computing paradigms are concerned with language specific constructs rather than efficient input and output. Until recently, many data structures provided by software environments were rudimentary, leaving application developers to create application-specific data structures. More modern environments have increasingly sophisticated implementation of a variety of classic data structures and algorithmic operations that can be used to efficiently implement most application logic.


Compression of XML to reduce transmission and storage size and possibly reduce parsing overhead is a frequently targeted goal. The two main methods used are the addition of general purpose compression to the XML datastream and direct coding in binary, incrementally built formats. The latter is very much like CORBA, ASN.1 ber/der, ONC-RPC, and other packed binary formats. While these can be efficient on the wire and parsed quickly, they are not directly manipulatable and still require a parsing, object creation, and data copy step before the data is ready for use.

In either case, compression as an additional step or as an alternative wire format does not reduce the processing overhead in a significant way. General compression methods include: Lempel-Ziv-Welch/block coding, ZIP/GZIP/BZIP2, compact encoding, such as custom Huffman, and dictionary compression. These and similar methods are used in various combinations for protocol data compression.

Communication model

The communication model used by network applications can greatly affect efficiency, throughput, latency, and capacity. The efficiency of message processing is enhanced or hidden depending on the communication model used. The two main network communication models are briefly discussed.

Traditionally and commonly today, many network applications have used synchronous, half-duplex, single-threaded communication models. Examples of this include various types of RPC, such as ONC-XDR, DCE, DCOM, CORBA, RMI, XML-RPC, HTTP 1.0, and synchronous SOAP. This model is characterized by a client application making a connection, sending a single request, waiting on the reply from the server, and then making the next request. The server in this model is passively waiting for a connection and request and expects to process one transaction before working on the next.

To improve efficiency, throughput, capacity, and latency, the richer model of asynchronous, pipelined, and possibly chunked communication should be used. This model employs persistant connections, asynchronous messages, transaction IDs, pipelined requests and responses, and is more peer to peer than client / server. Examples include HTTP 1.1; IMAPv4; Instant Messaging/Presence systems like AOL BuddyList/AIM, Yahoo Messenger, and Jabber; and BEEP. The JMS [Java Message Service] defines a high-level API for developing message oriented systems which are implemented using a product that adheres to the JMS interface. In some cases these use a true message oriented communication model while in others the model is more logical and the actual communication model is primarily RPC-based. This group of models has two major operating modes: point-to-point and publish/subscribe. Often, both of these are used in the same system. Point-to-point describes any system where messages are routed simply by the destination of a connection or by an explicit address of the remote endpoint. Examples of this would be web requests via HTTP or instant messages to a certain user. Publish/Subscribe describes systems where endpoints register to get messages that match certain criteria. JMS explicity supports publish/subscribe by message filters and other examples of this include the buddylist in an instant messaging/presence system. The key characteristics that affect efficiency are: ability to transmit a stream of requests or responses with overlapped processing with regard the the other endpoints, support for asynchronous messages from any party without polling, and bulk transfer of small transactions with far fewer I/O operations.


In a number of application scenarios, tracking what has changed can be very useful but difficult to accomplish. The most well known solution to this problem is to use a relational database with transactional abilities to support rollback when required. This is of little help for data in an application’s working memory. Other common examples include undo capabilities in interactive applications and exception handling needs in sophisticated programs. Additionally, it is possible to achieve much higher performance with lower memory requirements in multi-session server applications, such as a web application server, if complex base state can be ‘copied’ and modified in each session by a lightweight copy-on-write data layer. While some mainframe and certain langugage environments include some of this functionality, esXML and esDOM offer this ability for business objects in common languages while remaining efficient.

Stream vs. object (SAX vs. DOM)

XML, being derived from SGML and HTML, is designed to be suitable for documents of any size. XML was also designed to be suitable for use as a data interchange format for general use. Two processing models have evolved centered around each of these types of data. SAX is a standard API for a parser that maintains no memory or structure and simply parses tokens and makes callbacks to application code to consume each part of a document. This stream processing is useful in processing documents that are arbitrarily large because they can be processed as they are read with only the minimum resources required for the application. This method is also used when an application is reading data to populate an application specific internal memory structure. The DOM [Document Object Model] combines the process of parsing XML, possibly with a SAX parser, with a standard memory structure and API. Because it is providing an object view of the entire document separate from application logic, DOM must read the entire XML document into memory before application logic has access. This can simplify application development, although DOM is oriented toward XML concepts rather than advanced container and collection interfaces.

Most application use of data objects requires them to be treated as a complete object, which is compatible with DOM or an application specific equivalent combination of SAX and a data structure. esDOM is an optimized DOM operating on the esXML structure. esXML can be used in streaming mode by sending general update or append-only deltas.


The core ideas that esXML is based on are independant of XML: It is possible to have a portable, modifiable, and rich data structure that is efficient and is represented the same in memory, on the wire, and in storage. esXML is a portable, modifiable, and rich binary infoset that is efficient and is represented the same in memory, on the wire, and in storage.

esXML is a highly structured equivalent to XML 1.0+, and could be called a binary infoset. This structure is intended to become a standard, portable format with features that directly address efficiency and certain extended semantics. The range of use is intended to be broad and include most ‘business’ application object data. esXML does not require a schema and it does not, in the most typical usage, precompile tag tokens.

esXML is divided into the following layers:

  • Elastic Memory and Virtual Pointers (vptrs or vpointers)
  • Base structure and options
  • Tables
  • Elements, Attributes, and Data

Elastic memory

Elastic Memory is the name for the data structure that forms the basis for esXML. The name derives from the idea that to solve certain constraints you need a virtual memory space that can stretch and compress efficiently. These constraints include being able to modify complex data in-place by growing the size of a range of memory. Conversely, the when the size of a data item fluctuates, the cost should be minimal and should only normally be incurred when a memory range grows above a high water mark relative to flucuations.

esXML is meant to be read and written without requiring any processing, although in some circumstances condensing may be desired. To allow for a tunable balance between efficient I/O and modification overhead, esXML data is divided into tunable chunks, or blocks. Typically these would be 4-256K in size. An implementation could have an allocation cache of esXML blocks which would allow objects to be read in, modified, and written without any further memory management. Each block has a block ID.

The elastic memory layer provides a mini-virtual memory space. This memory space has two special features: data can be inserted and deleted at any point and vptrs [virtual pointers] can be created that point to and track any point regardless of surrounding insertions and deletions. This is done by tracking which bytes and characters in each block are used or not used with a range list, a bitmap, or a bitmap of range lists. If a block is full, an insertion will split the block at the insertion point or midway. If a block is not full, data after the insertion point is condensed to or toward the end of the block, creating a gap. When data is deleted, the gap is tracked as available space. In the case of a ‘variable’, i.e., an element or attribute value that changes size frequently, this means that reallocation will only happen when size increases above a high water mark. Vptrs are tracked in the block where they are anchored. Any change within that block is reflected in vptr values if necessary.

An esXML delta contains a reference to the parent document’s GUID. The elastic memory layer for a delta contains a data range reference to the parent. Changes create data and erase ranges which insert or delete data from the virtual address space of the parent. These three types: data, parent, and erase, form the data ranges that are quickly traversed for any operation of a dependant layer.

Base structure

The base structure of an esXML document includes version information and options which may include: a GUID and/or Tumbler for the document, indication of being self contained, whether there is a dictionary cross-reference, whether tables have back references, whether dictionary maps are used, and block size.

Figure 1: esXML structure
[Link to open this graphic in a separate page]

This diagram shows how the elastic memory and XML representation layers interact. It also indicates how element, attribute, and body data are represented.


Tables can optionally be used to track tags, names, and strings. The use of tables is not required in a particular document. Each entry in a table receives a unique ID used in all references. The table structure uses a linear array of entries, a vector map of entries, and an index for fast searching. Key compression is used when possible. Each implementation determines when and how analysis is done to maximize table use.


The data in an esXML document is represented by a depth-first structure of nodes consisting of length, type, and literal or table tag reference. This is repeated for attribute, body, and processing instructions. This structure allows rapid traversal both at a particular node level and traversal into the depth of structure. For large numbers of nodes at certain levels, optional indexing can be enabled by a particular implementation. This allows rapid searching for particular node names, node IDs, and indexing into large lists of similar nodes (as an array).

Comparable efficiency

Compared to traditional methods of managing deserialization, memory and object management, and serialization, esXML is much more efficient. This is especially true of traditional XML processing in component and N-Tier application environments. The precise performance relationship depends on particular application patterns; however, there are clear advantages in the lack of required overhead. A ‘hello world’ esXML/esDOM application performs almost no processing other than I/O. Only a small number of ‘chunks’ need to be allocated and can be kept in a pool for futher processing.

Figure 2: Comparable efficiency
[Link to open this graphic in a separate page]

This graphic illustrates the expected efficiency of esXML vs. traditional XML processing methods.


The esDOM is similar to the core Document Object Model API with basic XPath support but with an interface suited for use directly as a data object or collection, such as an STL hashmap. The API supports both shallow and deep copies, but notably includes a ‘scoped esDOM’ type of reference. A scoped esDOM is an object that holds a reference to a subtree of the overall document tree and allows access only to that subtree, similar to a Unix/Linux chroot(). This is obtained by passing a path to a factory method. A scoped esDOM can be used to represent the member variables for business objects that combine to form an overall business document object. By using this method, full object orientation with object specific methods and even object-specific setters and getters.

The core methods provided by esDOM allow direct creation, management, and search of data within the esDOM data object representation. These methods are analogous to setters and getters, collection accessors, and DOM idioms. Because esXML has pointer, array, and map semantics, most traditional data structures can be represented and managed by libraries. The core esDOM interface is illustrated here in Java. The ‘path’ argument refers to an XPath reference to a particular element text, attribute, or subtree.

Figure 3: Basic esDOM API
public interface esDOMInterface {
    /* Constructors and initializers */
    public esDOMInterface cloneesDOM();
    public esDOMInterface();
    public esDOMInterface(InputStream ps);
    public esDOMInterface(File file);
    public esDOMInterface load(InputStream ps) throws esDOMException;
    public void save(OutputStream ps) throws esDOMException;
    public void load(File file) throws esDOMException;
    public void save(File file) throws esDOMException;
    Node getContextNode();
    public String getRootPath();

    /* Factory Methods */
    public esDOMInterface getesDOM(String path) throws esDOMException;
    public esDOMInterface clone(String path) throws esDOMException;

    /* Append, insert, and set Strings */
    public Node append(String path, esDOMInterface sdom) throws esDOMException;
    public Node insert(String path, esDOMInterface sdom) throws esDOMException;
    public Node set(String path, esDOMInterface sdom) throws esDOMException;

    /* Append, insert, and set data elements, converting non-string data types */
    public Node append(String path, Object data) throws esDOMException;
    public Node insert(String path, Object data) throws esDOMException;
    public Node set(String path, Object data) throws esDOMException;

    /* Getter for String return value. */
    public String get(String path) throws esDOMException;
    public String get(String path, Object default) throws esDOMException;

    /* Getters for data types other than String. */
    public boolean getBoolean(String path) throws esDOMException;
    public Byte getByteObject(String path) throws esDOMException, NumberFormatException;
    public byte getByte(String path) throws esDOMException, NumberFormatException;
    public Character getCharacter(String path) throws esDOMException;
    public char getchar(String path) throws esDOMException, NumberFormatException;
    public Double getDouble(String path) throws esDOMException, NumberFormatException;
    public double getdouble(String path) throws esDOMException, NumberFormatException;
    public Float getFloat(String path) throws esDOMException, NumberFormatException;
    public float getfloat(String path) throws esDOMException, NumberFormatException;
    public Integer getInteger(String path) throws esDOMException, NumberFormatException;
    public int getint(String path) throws esDOMException, NumberFormatException;
    public Long getLong(String path) throws esDOMException, NumberFormatException;
    public long getlong(String path) throws esDOMException, NumberFormatException;
    public Date getDate(String path) throws esDOMException, ParseException;

    public boolean getBoolean(String path, Object default) throws esDOMException;
    public Byte getByteObject(String path, Object default) throws esDOMException, NumberFormatException;
    public byte getByte(String path, Object default) throws esDOMException, NumberFormatException;
    public Character getCharacter(String path, Object default) throws esDOMException;
    public char getchar(String path, Object default) throws esDOMException, NumberFormatException;
    public Double getDouble(String path, Object default) throws esDOMException, NumberFormatException;
    public double getdouble(String path, Object default) throws esDOMException, NumberFormatException;
    public Float getFloat(String path, Object default) throws esDOMException, NumberFormatException;
    public float getfloat(String path, Object default) throws esDOMException, NumberFormatException;
    public Integer getInteger(String path, Object default) throws esDOMException, NumberFormatException;
    public int getint(String path, Object default) throws esDOMException, NumberFormatException;
    public Long getLong(String path, Object default) throws esDOMException, NumberFormatException;
    public long getlong(String path, Object default) throws esDOMException, NumberFormatException;
    public Date getDate(String path, Object default) throws esDOMException, ParseException;

    public String[] getAll(String path) throws esDOMException;
    public Node append(String path, String[] values, int n) throws esDOMException;

    public void setState(int state);
    public void setReadOnly();
    public void setReadWrite();
    public void setUnavailable();
    public boolean isReadOnly();
    public boolean isReadWrite();
    public boolean isAvailable();
    public String toString();

    public int elementCount(String path);
    public boolean elementExists(String path);
    public void remove(String path) throws esDOMException;
    public void debug(boolean flag);
    public void print();
    public void print(PrintStream ps);
    public void prettyPrint();
    public void prettyPrint(PrintStream ps);

    public esDOMInterface cloneesDOM();
    public esDOMInterface();
    public esDOMInterface(InputStream ps);
    public esDOMInterface(File file);
    public esDOMInterface load(InputStream ps) throws esDOMException;
    public void save(OutputStream ps) throws esDOMException;
    public void load(File file) throws esDOMException;
    public void save(File file) throws esDOMException;
    Node getContextNode();
    public String getRootPath();

The esDOM interface includes append, insert, set, remove, get, and count operations along with clone, creation of scoped views, and convenience functions for basic types. Extended versions of this API contain complete collections algorithms, similar to C++ STL or Java container classes. This API is also designed to be subclassed for special purpose semantics.

Figure 4: Example esDOM method calls
esDOM es = new esDOM();
es.append("/a", "test");
es.append("/b", 1);
es.append("/c", 1.1);
es.append("/d", false);
es.set("/b", 2);
es.insert("/aa", "test2");
es.insert("/a[2]/a", "test3");
es.append("/a", "test4");
String test = es.get("/a[3]"); /* "test4" */
int count = es.elementCount("/a"); /* 3 */

esDOM esa = es.getesDOM("/a");
test = esa.get("/a"); /* "test3" */

These esDOM method calls demonstrate the ease of development and similarity to object set and get methods.

Competing solutions

There are no competing solutions known to the author that incorporate an efficient XML-equivalent representation that supports in-place modification with a common wire and memory data structure format. Additionally, features, such as copy-on-write layering of business objects to support in-memory transaction semantics and numerous lightweight sessions, are not generally available to developers.

One example that is somewhat more related is SXML which represents an equivalent of XML with Scheme S-expressions. This representation is both text based and directly reflects a Scheme data structure. Parsing is still required; however, the concept of a direct mapping between an XML dataset and a 3GL data structure is similar to the goal of esXML.

Another close example is a serialized pyxie tree (Python pickles). This is a persistant DOM representation that is somewhat efficient for Python applications.

Most solutions that are somewhat similar fall into these categories:

  • Wire Compression — the object is to minimize wire format (bin-xml, WBXML, BML)
  • Application Specific, ‘Compiled’ encoding — (WML, MPEG4 scene graphs, MPEG7)
  • XML Libraries to assist parsing and representation — (expat, LibXML2, Xerces, Xalan, MSXML, DOM, SAX)

Note: esXML should not suffer from binary overflow problems because the sizes and references are internal and relative to a chunked virtual address space.

The following is a list of a few of the alternatives to XML and esXML:






This is the MPEG-7 Binary Format, a type of compiled XML. It is closely related to binXML.














StAX [Streaming API for XML]

Java-based API for pull-parsing XML




XML Short-tagging




Steve Roberts, John Hudzina, and Stephen Atherton acted as foils, provided feedback, and helped develop benchmark testbeds. Steve helped refine the name; Stephen is an esXML developer. Management, clients, and colleagues at High Performance Technologies, Inc. and others, including Lea Bavaro, provided support and encouragement.


[ARC 2002] Arciniegas, Fabio, C++ XML, New Riders 2002.

[BIN 2001] XML-bin DIME, Status, http://lists.warhead.org.uk/pipermail/xml-bin/2001-June/thread.html.

[BON 2002] XML Efficiency, Alternatives, IBM XTalk http://www.oreillynet.com/pub/wlg/1858.

[BRA 2003] XML Is Too Hard For Programmers, http://www.tbray.org/ongoing/When/200x/2003/03/16/XML-Prog.

[COM 2003] XML and Compression, http://xml.coverpages.org/xmlAndCompression.html.

[GUP 2002] XML TK, http://www.cs.washington.edu/homes/akgupta/Research/XMLToolkit/xmltk.ppt.

[HU 2002] XML encryption specs approved, http://zdnet.com.com/2100-1104-976701.html.

[JEU 2002] Generic Programming for XML Tools, http://www.cs.uu.nl/~johanj/publications/gp4xml.pdf.

[KAP 2001] DOM light, http://koala.ilog.fr/domlight/.

[MAL 2001] XML Database Engines, http://chief.cs.uga.edu/~jam/home/theses/rakesh_thesis/rakeshpaper.pdf.

[MAN 2001] “Binary XML” proposals commentary, http://lists.xml.org/archives/xml-dev/200104/msg00241.html.

[MAN 2003] A Data Model and Query Language for XML, http://www9.org/final-posters/poster74.html.

[NEU 2002] Compact In-Memory Representation of XML Data, http://citeseer.nj.nec.com/neumiiller02compact.html.

[OAS 2003] FIXML — A Markup Language for the FIX Application Message Layer, http://www.oasis-open.org/cover/fixml.html.

[PAU 2003] XML Alternatives, http://www.pault.com/pault/pxml/xmlalternatives.html.

[PER 2003] XML for Web Developers — in 500 words or less! http://www.perfectxml.com/XML500.asp.

[SL1 2003] Slashdot: XML Co-Creator says XML is Too Hard For Programmers, http://slashdot.org/articles/03/03/18/0712248.shtml?tid=95&tid=156.

[SNE 2001] IBM Vinci, XTalk, http://lists.warhead.org.uk/pipermail/xml-bin/2001-June/000088.html.

[SPO 2003] Back to Basics, http://www.joelonsoftware.com/articles/fog0000000319.html.

[SUC 2003] XML Sucks, http://www.xmlsucks.org/.

[SUN 2001] Algorithms and Programming Models for Efficient Representation of XML for Internet Applications (Millau), http://www10.org/cdrom/papers/542/.

[SUX 2003] XML Sucks, http://c2.com/cgi/wiki?XmlSucks.

[TAG 2002] binaryXML, marshalling, and trust boundaries, http://lists.w3.org/Archives/Public/www-tag/2002Dec/0022.html.

[TAG 2003] Binary XML problem statement, http://lists.w3.org/Archives/Public/www-tag/2003Feb/0224.html.

[TEL 2001] “Binary XML” proposals commentary, http://lists.xml.org/archives/xml-dev/200104/msg00316.html.

[VEI 2003] The XML C parser and tookit of Gnome: libxml, http://www.xmlsoft.org/.

[W3C 2000] Requirements and Goals for the Design of an XML Encryption Standard, http://lists.w3.org/Archives/Public/xml-encryption/2000Nov/0062.html.

[W9a 2003] Millau: an encoding format for efficient representation and aexchange of XML over the Web, http://www9.org/w9cdrom/154/154.html.

[WBX 2001] wbxml4j is a Java class library for encoding and decoding WBXML, http://wbxml4j.sourceforge.net/.

[WIL 2001] Some central design issues (thread), http://lists.warhead.org.uk/pipermail/xml-bin/2001-April/thread.html.

[WRO 2001] Professional XML 2nd Edition, WROX Press Ltd., 2001.

[ZHE 2001] XML Compression Tools, http://www.math.uwaterloo.ca/~wwzheng/XMLproj.htm.

Efficiency structured XML (esXML): XML without most processing overhead

Stephen D. Williams [Senior Technical Director, High Performance Technologies, Inc.]