This paper describes Hydra, a graphical software tool designed for use on the Web, that is intended to simplify the building, editing, and searching of large and complex information hierarchies. Specifically, the initial development focuses on facilitating the development of XML [Extensible Markup Language] based standards used in supporting business-to-business e-commerce transactions within the electronics industry. Hydra uses novel 3-dimensional methods for displaying and manipulating such structures in a graphical format that simplifies their representation, navigation and understanding by a user.
As the price of computing equipment, and therefore the price of computation, continues its rapid fall, larger and larger groups of information are being processed with the aid of computers. Business interaction, for example, is finally moving toward an entirely electronic model. In the past, in order to describe a business transaction, two businesses would exchange numerous small documents formatted for human readers (letters, paper parts catalogues, etc.). As the process has become more automated, many documents, often overlapping, have been combined into larger documents, each of which describes a particular business interaction. As the documents grow larger and more complex, and as they contain more and more information designed to help automate business processes rather than aid the human user, they become unwieldy and unfriendly to human users, and information becomes increasingly difficult to find.
The Hydra visualization system described in this paper addresses three primary tasks users face when dealing with large xml-based standards and business-process-oriented documents1. : creating a new document (and filling the necessary information into that document); editing an existing document; and searching for information within an existing document. These tasks deal with human conceptualization and comprehension abilities. The computer cannot provide all of the information needed to fill in these documents itself; it must therefore help the human user by organizing, relating, and focusing relevant information. It must present a large amount of information in a useful and understandable way. This paper proposes a novel way to perform those tasks such that human users are significantly more comfortable when performing the above tasks.
The PIP [Partner Interface Process®] standards, developed by the RosettaNet consortium (http://www.rosettanet.org), are typical of the large and complex document types used to automate business processes. Each describes a particular business-to-business interaction as a sequence of steps in which one or more messages or documents are exchanged between business partners. Business data required for the business transaction is contained within these documents as structured information (i.e. a hierarchical, XML-formatted document).
The first set of human comprehension problems arise from the size of the PIP standards. For example, NIST [the National Institute of Standards and Technology] has been developing tools to facilitate the development of standards for B2B [business-to-business] interaction utilizing the PIP labeled 2C2. The 2C2 standard describes an Engineering Change Order, and has two associated documents. The first is an Engineering Change Request, the second is an Engineering Change Confirmation (which itself can contain proposed changes to the Change Request, etc.). Each of those documents contains slightly more than 500 elements.2
Moreover, the 2C2 is only a moderately-sized PIP; a 3A4 PIP, for example, has upwards of 1,200 elements. A given 2C2 Change Request might also contain dozens of email addresses — contact address for each partner, contact information for parts suppliers, approved manufacturer contact information for each changed part, and so on. In order to be able to navigate3 through such a large document, the user must have both a sense of where s/he is within the larger document and a detailed, uncluttered view of the current information being examined or edited.4 Otherwise, either the user cannot see what the results actually contain, or s/he cannot see how the result is connected to the information model as a whole (e.g., Whose email address is this? Why is it included? Alone, an element titled “ID”, when the document contains 20 other “ID” elements, will not mean much to the user).
A second set of problems relates to NIST’s efforts to include PIP generation and editing capabilities into a web-based interface. We wanted the NIST tools for building and editing the PIP document standards to have an interface that would be intuitive for business-level persons to use, not only programmers. A web-based interface makes that possible. Such an interface allows the user to visit the same location every time s/he must do something PIP-related, and makes use of an existing framework (the web browser) to present the interface. Web-based tools are, for the most part, platform-independent as well.
Initially, to create the web-based interface mentioned in the previous paragraph, we used Java Servlets and the Java API for XML Parsing (JAXP) to dynamically generate HTML forms based on a given PIP instance document. While this method was flexible (that is, the servlet could take essentially any PIP and generate a form), the generated HTML [Hypertext Markup Language] forms provided very little contextual information. The forms required the user to “drill down” to the element in which s/he was interested by clicking a series of “submit”-type buttons, each of which generated a new HTML page based on the user’s decision. The end result had at least two deficiencies: first, at each new level, the user lost sight of everything from the previous level, such as other possible paths s/he might take and information about where s/he currently is within the tree; second, the common understanding of the Web’s GUI [graphical user interface] that the user has already built up from using web forms in the past was not only useless, but in fact a hindrance. In practice, “submit” buttons on a web page are used to indicate that one is finished with the task at hand. However, in this new context, these buttons performed precisely the opposite function — the user clicked them in order to begin entering data. Consequently, this GUI hindered rather than facilitated, and therefore was impractical for entering data into a newly created PIP instance, as well as excluding searching capabilities altogether.
Take, for example, the task of filling in a zip code for the business description of a manufacturing partner. The user is first presented with a page containing a button representing the root of the tree. No cues are given. S/he must click that button, then one on a new page to indicate that they want a business partner. The next page might have to- and from- role descriptions to fill in. The user is then presented with a new page with different address components (addressLine1, cityName, etc.). If the user clicked on “to-role”, but wanted “from-role”, s/he must first figure out that, even though the page looks identical, s/he has clicked the wrong node, and then must find the browser’s back button, conceptually removing him/herself from the task at hand, in order to undo their mistake. The user also might not know which option to take at a particular juncture, and since each path taken must be completely retraced before the user can take a different one, filling in a document from scratch, or particularly editing an existing document, can be an arduous task.
We were unable to find any off-the-shelf tools that would meet the requirements of the standards consortia with whom we collaborate. Therefore, we found it necessary to design and develop the Hydra 3D visualization tool.
In keeping with these principles, as many GUI elements as possible are made available via a single mouse or keyboard action. The tree display panel utilizes a number of independent visual cues to minimize the number of mouse clicks performed while navigating the tree. All visualization and editing action centers around and includes the main display canvas, whereas file attachment and other unrelated functions (such as writing out to a file) are separated and hidden from view until needed. In general, options are available only when they make sense with regard to the UI, not whenever a programmed variable can be set to one value or another.
The resulting software is known as the Hydra visualization system.
The only toolkit available to us that provided the necessary functionality (i.e., to interact cleanly with the user in a web-browser-based environment, as well as to provide a 3-dimensional interface) was Java3D (http://java.sun.com/products/java-media/3D/). Though Java has traditionally been viewed as slow — and, indeed, that view is often correct — the Java3D toolkit is essentially a high-level wrapper around native hardware-accelerated 3D functions. As such, it is fast. Our test machines were able to comfortably display hundreds (and, in some cases, thousands) of nodes (that is, the visual representation of an XML “element”) simultaneously and still allow user interaction at full speed.
The remainder of the program was developed using standard Java Swing classes (http://java.sun.com). Database access is provided via the standard MySQL JDBC [Java Database Connectivity] driver.
The first, and perhaps most noticeable, characteristic of Hydra is that it uses a 3-dimensional (3D) user interface for document navigation.
One-dimensional (1D) interfaces, such as the HTML form-based interface mentioned above, had already shown their weaknesses — navigation and providing context are extremely difficult for large documents. Two-dimensional (2D) interfaces were the next logical step. We researched and implemented a proof-of-concept tool using the University of Maryland’s ZUI [Zoomable User Interface] Java APIs (http://www.cs.umd.edu/hcil/jazz/download/index.shtml). The concept behind the ZUI is that everything is laid out on a single plane, which can be moved closer or further from the viewer via mouse and keyboard navigation. While the user’s contextual location was, indeed, much easier to determine than with a 1D UI, determining some comprehensible layout for a document the size of a PIP was not practical, nor was determining any consistent way to separate the zooming and selecting actions in order to allow data entry.
Hydra uses a layout method that takes full advantage of all 3 dimensions without overly complicating the point-and-click navigation concept.5
Initially, an XML document or schema is parsed with JAXP [Java API for XML Processing], and the resulting in-memory DOM [Document Object Model] tree is used to build a visually analogous tree in 3 dimensions. The tree is laid out from left to right, with the root node at the far left. (NOTE: in the following text “node” means the visual object representing an XML element, whereas “element” means the corresponding data contained in the DOM tree.)
Mouse-based navigation of the tree is accomplished by using all three buttons available on a standard mouse (or system-dependent modifier keys, if no second or third buttons are available). See Figure 1. Any or all of the mouse actions can be combined. All user interaction is conceptually performed on the tree; the user is not moving him- or herself around the tree, the user grabs and moves the tree. The first button is used to select a node and rotate that node and its subtree about the x-axis. Rotation speed is adjustable for the same reasons that mouse pointer acceleration is adjustable in most operating systems: some users like precision, others prefer speed. Since rotation is the primary mode of operation for Hydra, only rotation speed is adjustable (drag and zoom speed are not). The second button is used to drag the tree around the x-y plane, and the third button is used to zoom the tree forward or backward along the z axis. Zooming is proportional; as the tree gets closer to the viewer, the zoom gets slower to enhance precision. Apart from increasing precision, the proportional zoom also makes it impossible to zoom “past” the tree — the tree is aligned along the x-y plane at z=0, and as the viewpoint approaches z=0, zoom speed approaches 0 as well.
Mouse-Based Navigation (top to bottom: examples -- button 1 (left, rotates) button 3 (middle, zooms); combined diagram)
One of the biggest problems with a document of this size is the conflict between making the display compact, and making the display clear. Hydra’s solution to that problem is to hide node that are not likely to be immediately relevant. The tree can be collapsed and expanded, either all at once, or one node at a time. When a node is collapsed, its connectors remain as a visual cue — if there are any children, there is a connecting bar, and if more than one child, a small disc is also visible — but its children (and the rest of that node’s subtree) disappear, allowing the user to rapidly identify what it is s/he really wants to see. A particular node can also be “isolated” with a single click, which means that only the particular path leading to that node remains visible. Again, isolation is a very rapid way of pruning unnecessary visual clutter from the tree.
Keyboard navigation is also provided, primarily as a backup and to increase user familiarity during the initial learning phase. Keystroke actions also serve to provide other, less frequently used movement options, such as “looking” left, right, up, and down, rather than translating or “strafing”. Standard Java3D key bindings are utilized.
Each node in the tree consists of a shape and a label.
A visual node
The shape is used as a visual indicator as well as a sort of “handhold” to increase the amount of space the user has to select a given node. The label displays the element’s name. Each node can be individually selected; when an element is selected, it is given the “highlight color”, and only one element can be selected at a time. Each node can be rotated; when an element rotates, so does its entire subtree. Labels are only visible for 180 out of 360 degrees of a node’s rotation. Consequently, the entire tree appears less cluttered; only the “forward-facing” labels are visible (the user does not have to selectively ignore illegible “backward-facing” text). The chosen shape is a cone, which helps orient the viewer as to the direction and flow of the tree. Color is assigned to a node based on the group of siblings to which it belongs — this helps the viewer associate related information, both with regard to nodes on different levels, and with regard to situations in which there are many nodes on the same level belonging to different parents.
If a node has no children, it only consists of the shape and the label. (fig. 2) If it has one child, the node is directly connected to its child by a cylindrical bar, and the child is aligned along the y- and z-axes — only its horizontal position is different. (fig. 3)
A node with one child
If a node has more than one child, it is connected via a cylindrical bar to a “child disc”; the children are laid out at equal intervals along the circumference of the child disc. The disc itself is visually permeable, either via partial transparency or by being composed of lines rather than shaded shapes; the key point is that the children are visible through the parent’s “child disc”, so that the process of laying out the tree does not inadvertently make viewing the tree any more difficult. (fig. 4)
An expanded multi-child node (left) with multiple types of children
Coloring the tree turned out to be more difficult than expected. The choice of node colors should aid the user in understanding node-to-node relationships in situations of many closely-clustered nodes, or in the case of overlapping nodes. Initially, each major subtree was given its own color, and all nodes in that subtree were of the same color. While major branches were identifiable, the overlapping or cluttered-node situations generally arose within one major tree branch, not between one branch and another. The second version of the coloring algorithm called for each node to receive one of only a few bold colors; for example, children 1 through 6 might receive unique colors, and children 7 through 12 would cycle back through the same colors. While separation was achieved, two problems resulted: first, the nodes were not clearly identifiable as part of a parent-child relationship; second, in situations where colors were repeated, false visual relationships were established (child 1 and 7 of the previous example appeared to be somehow related more closely than child 1 and 3, even though they were not).
The current coloring algorithm attempts to correct the shortcomings of each of the previous two. Firstly, all node parts are colored with a single color (the shape, label, connecting bar, and connecting disc) so that parent-child relationships are clearly visible. When a child node is attached to a purple connecting disc, for example, the user can reliably follow the purple color back to the parent node. Secondly, all children of a given node have a single, randomly-generated color, indicating their sibling relationship. The result is a tree that has clear visual separation while maintaining clear parent-child and child-child relationships.
Below the canvas are two panels, the current node panel and the search panel. Both are visible at all times, since their associated actions are frequently used while navigating the tree. The current node panel is closer to the 3D canvas than the search panel, as it is the most frequently used.
Whenever a new node is selected via a mouse selection action, the current node panel is updated to reflect the new selection. The panel displays the node’s name as well as one or more levels of its parents’ node names (scrollable back to the root node of the tree). Consequently, the user can view, in one place, all relevant information about the node on which s/he is currently working. If the node contains data, an editable text field at the right side of the panel (or other, more appropriate, control, depending on the type of data to be entered) becomes active to allow user interaction. (fig. 5)
The current node panel after a node containing data has been selected.
If the node has no data, the field is not editable, is grayed out, and displays a message to the effect that the node cannot be edited. The point of having seemingly redundant visual cues is to more effectively catch every user’s attention, and avoid wasting time attempting to enter data into non-editable elements. (fig. 6)
The current node panel with nothing selected
If the node does contain data, the user can either enter new data or edit the existing data via the text field. Changes are saved automatically whenever the active node changes — that is, whenever the user selects a new node, begins a search, or simply deselects the node on which s/he were previously working. The user may have to enter several hundred discrete pieces of information, and as such, removing the need to click a “Save” button after every change significantly decreases the number of times the user’s hands must shift between mouse and keyboard. If a mistake is made (the only case in which the lack of a “Save” button is a detriment), a “Back” button is provided which is capable of stepping back through every activated node from program startup. A mistake that would have been prevented by a “Save” button would need to be recognized by the user; as such, repairing the error almost invariably means returning to the immediately previous node, which is a single-click action in this program.
The search panel allows the user to search for a particular node, or nodes, anywhere within the tree. The panel consists of three primary sections: a graphical information and interaction section, a textual input section, and an options section. (See Figure 7)
The options section allows the user to select whether element names (e.g., ContactName or ZipCode), element data (e.g., NIST or 20899), or both, will be queried when searching for the requested string/substring.
The search panel
The graphical information section contains an image that indicates to the user what the search results will look like once they have been activated within the main 3D canvas. For example, the current implementation uses a cone colored the search result color. Once a search string has been entered in the textual input section, two forms of feedback are given.
In the 3D canvas, the tree is collapsed, and only the subtrees above the search result nodes are opened. The logic behind this decision is as follows: the use of the search panel often implies that the user was unable to locate the desired node via manual exploration of the tree. Therefore, the current state of the tree (which nodes are expanded, which collapsed) is irrelevant. Hence, by collapsing the tree, and only opening the relevant branches, the search results are more clearly visible, and the tree’s new state is more likely to be applicable to the user’s needs. The result nodes are also expanded to roughly twice their original size and colored a reserved “search color”, again to aid in their identification. Once the search is completed (i.e. either the search field is emptied or the “Clear Search” button is pressed”), the nodes are returned to their normal size and color.
In the search panel itself, the indicator image (the white cone in Figure 7, used to indicate where the user is within the search results) is overlaid with a search counter displaying the number of search results. If there are 10 or more results, the user is presented with “10+”, as the difference between, for example, 35 and 37 results is insignificant — in nearly every such case, the user must refine his/her search in order to find the information for which s/he is searching.
Since the UI rules throughout the program allow the user to click on any cone and receive some sort of visual feedback, the indicator image is no different. When the user clicks on the image, the first search result becomes the selected node in the 3D canvas (its information is pulled up in the current node panel, and its appearance is set accordingly). The search counter is set to indicate the current result number — on the first click, it is set to 1, on the second to 2, et cetera. The user can therefore cycle through the search results without having to move his or her hand at all. Once the end of the results is reached, the counter is cleared. Should the user click again, the cycle restarts.
Drop-down menus are a common technique for hiding infrequently used (and, unfortunately, sometimes VERY frequently-used) actions and options for a program. However, when an application is presented within a web-based interface, the use of drop-down menus within the application can become confusing (the user is used to a single menu bar for each program, and the web browser already has a menu bar). To avoid this possible confusion, Hydra has relatively few options, and instead utilizes a tabbed-pane-based approach.6
The navigation pane
The navigation pane is visible at program start and contains actions and options pertaining to the navigation of the 3D tree. It contains the back button, the show/hide all toggle button, an option to have the current view follow the mouse-based selection, and a slider to control rotational speed. (fig. 8) All of these actions are directly related to visual manipulation. The file-based actions are considered a separate set of actions that do not affect the visual representation of the tree, and so are grouped together on a separate, initially hidden, pane.
The file pane
The file pane contains actions that are typically performed once the visual editing and exploration of the tree are complete. It contains an attach file button, a save button that writes the updated DOM tree out to an XML instance file, and a quit button. (fig. 9) These actions are frequently performed sequentially, and are separated visually from the rest of the program.
The attached files pane
These documents are sent as the payload of either an RNIF or SOAP message (or “envelope”), which is a multipart MIME object that can have included “attached” files. The attached files pane displays the currently attached files that will be added to the containing envelope for this instance document. A cap on file size can be set at compile time. The pane is invisible until a file is actually added, at which point it expands to provide a full view of the attached files. This pane is scrollable to prevent either truncation of filenames or an overly wide pane blocking the user’s view of the 3D canvas.
As we have seen, the Hydra Visualization System allows rapid navigation and understanding of documents that were formerly too large for a normal user to grasp. The program is capable of creating, editing, and saving XML documents, and runs on any platform that supports a few standard Java technologies (Java Applets, Java3D, Java XML parsing). The 3-dimensional display provides an enjoyable user experience, and does not suffer from the shortcomings of most existing XML navigation tools.
The primary disadvantage to the Hydra visualization system at the moment is the hardware requirement. While it runs smoothly on current mid- to high-end video cards, older computers will be unable to use the tool without an upgrade.7 Such stringent hardware requirements were unavoidable given the constraints of the project (3D, XML-capable web interface) and current software, but additional speed enhancements and/or lower-quality hardware requirements are critical for widespread use of the tool.
While Java3D is, in theory, a completely platform-independent system (as it runs on top of the Java Virtual Machine), the reality of the situation is somewhat different. As of the writing of this paper, Java3D is available for Solaris, Windows platforms, and Linux via the Blackdown porting project (http://blackdown.org). No Macintosh implementation is currently available. Although porting Hydra to a C/C++ foundation using OpenGL (http://www.opengl.org) libraries would lessen the hardware requirements and dramatically improve portability (nearly every modern operating system includes OpenGL support), the program would no longer be web-based. Still, porting to OpenGL should be considered if the web-based nature of the project becomes less important. If not, further research into alternative-platform implementations of Java3D is necessary.
Hydra is still a work in progress, and therefore has a number of UI issues still to work out. For example, in the current node pane’s display of a node’s ancestors (its path back to the root node), each successive parent should be selectable, and upon selection, should move the view to center around that parent. As it stands now, the ancestor display is merely informational, whereas it should be as interactive as the rest of the program.
Similarly, search display might be refined. While the difference between 35 and 37 results can be viewed as insignificant, the difference between 10 and 250 results may be viewed as significant. As no weighting is being done, direct access to high-numbered results should be provided. Perhaps the results could be grouped into smaller clusters (akin to the current search paradigm at http://www.google.com, or perhaps grouped by 3D location). The result number display should also be improved, perhaps to display the overall number of results as well as the current position within those results (“3 of 126”, rather than “3”). Furthermore, when a search is complete, returning the user to the root of a collapsed tree may not be the most efficient placement of the user within the program. While it is likely the best placement for the second and third major tasks mentioned in the introduction, during a new creation and fill-in task the user will more likely want to be returned to their most recent pre-search position, as well as having the tree returned to its former shape.
Along similar lines, the “Back” button should be redesigned, as it is really attempting to perform two entirely separate tasks. The first is that of a standard back button — returning the user to their next-most-recently-selected object. However, during the editing process, it is also being used to provide keystroke “Undo” functionality, and does not do so very successfully. Should the user accidentally erase needed information within a node before entering new information, there is currently no mechanism in place to retrieve that information. Consequently, the text editing process should most likely have its own undo button provided to clarify the separate tasks.
In its current incarnation, the visual tree is a representation of an in-memory DOM tree built from a parsed XML file. It might be useful to allow Hydra to extract and deal with more detailed information from a given XML document, as well as to parse other types of hierarchical files. Also, generation of a visual tree based on a schema alone (without an associated blank XML instance file) would decrease the number of steps required for setup, and increase the versatility of the program.
This tool should be applicable to a wide range of visualization duties; any hierarchically-arranged data can be displayed by Hydra. For example, IPC has shown interest in a visualization and editing tool for the IPC-2581 draft standard; Hydra could provide such functionality with minimal modifications.
The Hydra visualization system was designed in response to the lack of effective visualization and editing tools for large documents. It uses novel methods to achieve the goals set forth for the project (particularly, the use of the 3-dimensional display), but is based on standard libraries and tools (Java, XML, etc.). It allows the user to comprehend the structure and content of a much larger set of information than is possible with traditional visualization tools, and is available for use in a web-based environment, which allows for rapid, low-maintenence availability to end users.
A document, in this case, is generally either a single XML file, or a multipart MIME [Multipurpose Internet Mail Extensions] file containing an XML file and some number of attachments
An element might be a phone number, email address, zip code, any piece of information pertaining to this B2B interaction.
?Navigate? means, in this context, to extract the information for which one is looking from the document, while also having a sense of location within the larger document.
This has been called the ?Focus+Context? approach in research originated at Xerox PARC.
The overall layout is reminiscent of the Xerox-originated cone tree, but only in inspiration ? no functional description or executable version of any cone tree derivative program was available, so all design is original with the exception of one picture in Ben Shneiderman?s book Designing the User Interface.
Many existing interfaces are moving toward tabs as a way to efficiently organize grouped information spaces; for example, the Mozilla browser project (http://www.mozilla.org) and nearly all modern IDEs.
The NVIDIA GeForce4 TI4200 and GeForce 4 Go! ran smoothly (i.e. 30+ frames per second), but the ATI Rage 128 Pro ran the program at too slow a framerate to be useable (roughly 1 frame per second for 500 elements, less for 1200 elements).
[BOUVIER99] Bouvier, Dennis J. Getting Started with the Java 3D™ API. Mountain View, CA: Sun Microsystems, 1999. <http://developer.java.sun.com/developer/onlineTraining/java3d/>.
[OLSEN98] Olsen, Dan R., Jr. Developing User Interfaces. San Francisco: Morgan Kaufmann Publishers, 1998.
[SHNEID98] Shneiderman, Ben. Designing the User Interface: Strategies for Effective Human- Computer Interaction. Reading, Massachusetts: Addison Wesley Longman P, 1998.