DITA - The mechanics of a single sourcing project.

France Baril
france.baril@ixiasoft.com

Abstract

The Darwin Information Typing Architecture (DITA) is an XML-based, end-to-end architecture for authoring, producing, and delivering technical information.

This paper describes how DITA-based documentation was implemented at CEDROM-SNi, one of Canada's leading on-line news content aggregators. The project delivers documentation as diverse as user training materials and Web Services reference guides targeted to programmers. This paper focuses on the benefits, how tos, and lessons learned.

Technical documentation has its own unique challenges. Its deliverables range from simple reference guides and educational material to complex, multilingual procedure manuals. Critical success factors of a documentation project are numerous and diverse – usability, deadlines, cost, language, delivery media (paper, online) – all of which have their own purpose and challenges. This paper discusses these issues and provides a framework for future DITA projects.

Keywords: DITA; Editing/Authoring; Publishing

France Baril

France Baril has a B.A. in Communication and a B.Sc. in Computer Science. She has worked as a multimedia developer, a trainer and a technical documentation specialist. Her focus is on applying the power of XML to technical content development and publishing. She is part of the DITA Technical Committee at OASIS and is currently working on two DITA-based documentation projects. When she is not in front of her computer, you might find her rock-climbing, playing hockey, rollerblading or reading fiction.

DITA - The mechanics of a single sourcing project.

France Baril [Documentation Architect; Ixiasoft]

Extreme Markup Languages 2004® (Montréal, Québec)

Copyright © 2004 France Baril. Reproduced with permission.

Introduction

One of the greatest challenges facing technical communicators is delivering multiple quality documentation products (such as procedure manuals, reference guides, training material) as soon as the product being documented is delivered.

Before getting into how implementing DITA and XML helped meeting these challenges at CEDROM-SNi, we must define the specific challenges in more detail. This paper introduces some of the most important challenges met by technical communicators, and then some specific to CEDROM-SNi's project. That first part is followed by a very quick introduction to DITA and to the processes involved in implementing DITA for this specific project. Benefits and lessons learned are reviewed against the identified challenges.

Technical documentation's unique challenges

Creating content that meets users needs

It sounds obvious that documentation should focus on users needs and that meeting these needs should be easy once you know who is your audience. Unfortunately, users are not a uniform group; they have different product knowledge, different backgrounds and they may have different reasons for using the product.

Basic audience analysis usually identifies these three subgroups:

  • Beginners who need to get an overall understanding of the product's scope and purpose.
    They need to know the context to understand what the product does and why the product is useful to them. This is what will get them to the intermediate level. Studies show that beginners who can’t move to the intermediate level usually quickly abandon a product.
  • Intermediate users who must know their tool set and how to make use of it. This group usually makes up most of the audience.
    They crucially need both reference guides with great indexes and search tools to find specific information. They need to answer questions like: What does this control do? How do I set this option? What will happen (or not happen) if I don't set this option?
  • Experts who want to know shortcuts and to learn how to automate processes.

Basically, the challenge addressing these audiences’ needs within their own product understanding, without bothering them with information they do not need or they are not ready to understand. Studies show that when people don’t find what they need right away, they quickly abandon the task. Other audience factors depend on specific products or industries, like the users' possible roles and goals, or in the software industry, on the level of computer literacy [Cooper, Alan, Reimann, Robert].

Providing tools for finding and extracting information quickly

Users often refer to documentation after they already tried and failed to figure things out on their own. They need to find the information quickly. Good indexes, glossaries (that include synonymous) [Cooper, Alan, Reimann, Robert] and great search capabilities usually solve this issue. A good table of contents and document structure can help too [Feldman, Susan].

Producing multiple documentation products

Documentation teams' deliverables range from "getting starting" guides to advanced user references, and they can be delivered on paper or by any electronic means.

Sometimes users read documentation from beginning to end, but most often they scan through for useful information. For example, a training manual is usually read and used differently than an API reference guide.

Creating and updating documentation as the products are being created

Time is an important factor. Documentation is at the end of the development cycle, right before translation and delivery. Yet, delivery dates are often based on product development alone, so there is very little time to integrate screenshots, if any, and last minute changes. Making updates can be difficult when there are many changes in each document. There is a high risk of inconsistencies and human errors.

Keeping translation costs as low as possible

When information is found in more than one document, it needs to translated more than once. Each update has a domino effect on translation costs. Moreover, inconsistencies in translations are frequent in large projects and can lead to both user confusion and dissatisfaction.

The project specifications

The goal of the CEDROM-SNi project was documenting a service that offers news documents (newspaper articles, radio and tv show transcripts, etc). The Web application is used mostly by private companies, governments, associations, libraries, schools and universities for media monitoring, e-press clipping, archives and research for gathering information.

The service is offered by three different product interfaces in three different locales (en-ca, fr-ca, fr-fr). Product features are also offered through Web Services that companies can use to provide access to news documents from their own website.

Deliverables include:

  • An online reference guide for each interface.
    Different users have access to different features based on the package they purchased and their user rights.
  • Pdf trainer and trainee guides.
    This material must be easy to reorganize and reassemble. The trainer must be able to adapt it for specific groups, as many companies use the product for different purposes and have different knowledge of prior product versions.
  • An online Web Services guide that includes the API and XML schema reference information.

The audience is made up of:

  • Frequent users, where some are more computer literate then others.
  • Very advanced users, where some can write very complex queries manually to search for news documents without using forms.
  • Developers/programmers (for Web Services).

Information to include in deliverables is:

Table 1
  Online Reference Guides Online Web Services Guide PDF Training Guides
Functionalities* X X X
Tasks* X X (not same as user interface) X
Exercises     X (answers for trainer only)
Trainers Notes     X
API (WSDL)   X  
XML Schema documentation   X  

* Requires support for conditional text, since different audiences have access to the different tasks and features.

Putting documentation elements in the preceding tables shows the great potential for reusing information.

There was an extra challenge to this project: the overall specification was undefined at the start of the project. Although basic blocks were defined, how they came together was only defined has the project developed. Changes were frequent as users tested the new interfaces, and the basic user navigation was not finalized until the end of the project. For example, the "theme" feature was developed in the alpha release, but how to use it for a press review was only finalized after the interface is tested by multiple user groups.

Writers prefer to guide users through user processes rather then through a list of features, but because of the timetable, the system tasks had to be documented before all components for the real user tasks were ready. This made it difficult for the writers to create a good document structure or to base the table of contents on user tasks rather then on the system features. There was an immediate need to create information as chunks that could be moved around as the project evolved.

Documentation Team: 1 person who has access to XML-knowledgeable people.

Deadlines:

  • 1st day on project: November 2003
  • Alpha with basic tasks and features: March 2004
  • Beta with major tasks and features: June 2004
  • Web Services: July 2004
  • Final version in three languages with all features: September 2004

Presenting DITA's main advantages for CEDROM-SNi's project

This section covers the DITA features that are most relevant to the CEDROM-SNi project. You can get a fuller introduction to DITA at http://www-106.ibm.com/developerworks/xml/library/x-dita1/index.html.

DITA offers topics as the basic information block for reuse

Besides the generic topic type other proposed topics are:

  • Concepts, which propose information to answer the "what" and "why" questions important to beginners and some intermediate-level users
  • Tasks, which propose answers to the how to questions.
  • References, such as API documentation.

Why use topics as the base unit?

The topic is the smallest independently maintainable unit of content. Topics must be able to stand alone so that they can be understood when they are encountered out-of-context, for example when a user finds the topic through search, an index, or by following a link. [Priestley, M.]

It made sense for this project to use topics since the documentation was well-suited to chunking into the granular topics that would fit into the proposed DITA topic base. In our case, features were presented as DITA concepts, system tasks as DITA tasks and the Web Services APIs and XML schemas as DITA references.

Another advantage of working with topics is that it reinforces the ability to write independent chunks of information that can be reused in different contexts. For example, the task "saving a news articles search" can be used on its own in the online reference guide for intermediate-level users who need to be reminded how to do it; it can also be used in beginners' training sessions to show why they would want to save a seach they created.

Reuse and modularity have other immediate positive side effects:

  • Reduced update cost since fewer modifications need to be made.
  • Reduced translation cost and time since less content needs to be translated (and retranslated after updates).
    Because topics are kept in different files, it is easier to identify topics that have been modified from those that have not, and that, therefore, do not need to be translated after project updates.
  • Reduced risk of errors since it removes the tedious task of updating the same information in multiple places, which always had the risk of forgetting to update every place.
  • Increased consistency since tasks and concepts are defined the same way everywhere. This makes it easier for a user to identify the information patterns and reduces the risk of inconsistencies that could confuse the users.

DITA allows for topic specialization

Specialization is the process by which DITA lets you define your own topic types from existing ones.

Not all identified building blocks of the project fit the proposed DITA topics. The ability to create our own topic types was a very important factor, especially for those exercises in training manuals that didn't fit any of the proposed topic types, other than the generic basic topic.

DITA maps enable organizing and reorganizing content

DITA maps are used to identify topics to include in a project. They can also be used: to define relationships between topics; to create navigational tools; or to add metadata to topics.

Maps were very important to meet our need to present information in different orders in the different documentation products, especially for training documents, since the content differs based on each customer's particular needs. For example, training new employees in the basics of using the product, training regular users to use new features in a new version or teaching librarians to use queries for advanced searches might use different topics but might also use some of the same topics presented in a different order. Creating a different map for each documentation project is an easy and rapid task compared to other alternatives such as copying and pasting topics in each project and then updating each occurence of the same project in multiple manuals.

We are also using DITA maps to create "related links" at the end of each online topic that are customized to each deliverable's context.

DITA is more than an architecture standard

DITA does define an architecture, but the DITA materials also include complete DTD components and sample XSL stylesheets for its proposed basic topic types.

Starting from an existing DTD saved us a lot of time and trouble. Moreover, being able to base our XSL on the samples provided by IBM allowed us to get started right away.

Implementing DITA for project delivery

The following three sections describe the processes for each major group of deliverables. However, the content files used for the entire project are all stored together in the same document base on the same server.

The basic project elements are:

  • Topics, each of which are separate documents. When combined with graphics and various XML files (such as XML schemas and WSDLs), they form the whole project content.
  • Non topic content files. These include: the WSDLs and XML schemas that document Web Services; graphic files; and any other content that is relevant to the documentation.
  • DITA maps specifying the topic organisation for each documentation deliverable.
  • TEXTML Server, a native xml document base which stores all content files and indexes their content for search and retrieval.
  • XSL stylesheets for processing the XML content.
  • A Web application for delivering the online help.
  • XSL-FO processor for creating PDF manuals.
  • Scripts for pre-processing information and applying the XSL stylesheets in the right sequence and using appropriate parameters. Scripts are especially useful for activities like creating tables of content, indexes or getting the titles for topics referenced in the related links section or by other hyperlinks.
    Pre-processing this information allows faster retrieval from the online help system.

Producing the user online help

The following figure present major steps to producing the online help from multiple DITA topics and a DITA map.

  1. Once all DITA topics and the map for the online help have been created, the project is ready to be processed and made available online. Topic files, graphics and the DITA map are located in a single TEXTML Server document base.
  2. The following transformations are applied to obtain multiple outputs from the file:
    • The map hierarchy is used to create a table of contents that appears as a tree in a tab at the left side of the screen
      [Link to open this graphic in a separate page]
    • Each topic is scanned for "indexterm" elements that are used by the index search engine that is also available from a tab at the left side of the screen
      [Link to open this graphic in a separate page]
    • Titles of the files for each related links for which no text is provided are extracted and displayed.
      This preprocessing is done because it takes too long to look for titles in multiple files while the user is waiting for the topic to be displayed.
      [Link to open this graphic in a separate page]
  3. The resulting files are saved in the delivery document base.
    Titles are indexed separately from the rest of the content so that two search options – search in titles only or search in entire content – can be offered to the user.

[Link to open this graphic in a separate page]

The preceding graphic shows the search options that allow users to look for words or expressions in titles only or in the whole content.

We also index metadata in this project, and we include synonymous in the metadata so users can find topics about a subject even if the expression does not appear in the text itself. For example, in the task "Se connecter", we included the keyword "login" which is an anglicism often used by French Canadian who want information about logging into an application. If they search for "login", the application will return the task "Se connecter".

Since tasks are easy to distinguish from other information, we can return grouped search results where all generic topics, concepts and references are returned as "descriptions found" and all tasks are returned as "tasks found". It is an extra way to help the user find the right information quickly.

NOTE:

These processes are all implemented in script files and are performed automatically once the technical writer double-clicks a simple .bat file.

Producing the Web Services online help

Producing the Web Services help is quite similar to producing the user online help. One extra transformation is needed because the topics are extracted from available XML files instead of being written by a technical writer. When developers create a Web Service, they provide a WSDL, which is an XML file that contains information about the Service. This information could be sufficient to document the Web Service, but it often lacks comments and is hard to read for people to read. Our developers create their Web Services in C#, and we agreed that comments would be added in XML and reviewed before publication. Therefore all the information necessary to document the APIs can be extracted and presented as-is.

The Web Service description is extracted for each Service, transformed into a DITA reference topic and presented at the first tree level. Each method is transformed into a DITA reference and presented under the proper Web Service. Descriptions for parameters and return values come from comments in the C# code. The tree used to create the TOC is created automatically by scripts when the reference topics are created.

[Link to open this graphic in a separate page]

Using DITA to document Web Services allowed me to make use of an already-defined transformation for formatting, but also to reuse the functionality definitions created for the regular online help delivery. These definitions are grouped in a section that explains the purpose of the Web Services. This kind of reuse is enabled by DITA, but needed to be reinforced by guidelines for writing feature definitions specifying that the definitions were to focus only on the purpose of the feature and how it is useful into user processes such as media monitoring or archiving. No interface specific information was allowed in the feature definitions.

Producing the PDF training manuals

The first step to producing the PDF manual is the same one as for producing online help: list the topic you want to use in a DITA map. However, the processing is slightly different: the first processing step merges the topics together into a single file. Then, in a second step, the output is produced by sending an XSL-FO stylesheet and the merged XML file to the XSL processor.

Our first choice of XSL formatter was "fop", which is free software. Once all our content is developed, we'll choose the tool that best renders our particular content.

Features of our PDF manuals include the table of contents, which is exactly the same hierarchy as the DITA map file, and the index at the end of the document, which is extracted from the "indexterm" element in each topic.

What's next?

Although the first part of the project is over, there is still much to do:

  • When the user/account types the application are finalized, conditional text based on audience type will be added.
    This is one of the reasons why documents in the delivery document base are kept as XML files instead of XHTML files.
  • Now that the basic project structure is up and running, other people will join the project. Trainers will create their own topics and maps. A first step in that direction was taken, and its advantages were understood right away. However, there is still a need to implement workflow and to define guidelines for using the available elements consistently.
  • Another project is to implement a system to track users' search queries and their most-viewed documents.
    This should help us to identify when more accurate synonyms or new features, etc., are needed
  • For training guides, we would like to produce overviews based on topic groups and information available within topics.
    We will use text from the shortdescription element to create these overviews.
  • For the Web Services help, we want to create API overviews, each with short method description that is based on the description available in each method topic.

Issues

The whole process of moving to XML and DITA was less tedious then first expected. However, certain issues did come up that to be considered by others who would like to walk the XML/DITA path.

  • It is somewhat harder to write text in an XML editor than it is to write a regular editor. Although working with XML and DITA helps you be more consistent in your structure, it does not allow you to easily read what you just wrote. Tags are noise to the eyes and the brain. Moreover, you have to forget about one much appreciated tool: the grammar/syntax dictionary.
  • Learning to write in a modular framework means learning to think and work differently, just as programmers needed to learn to think differently when object oriented programming was first introduced.
  • In a single person team, there is a lot to do: creating XSL and XSL-FO stylesheets, scripting, writing, tagging, managing translation. For a single person to do all this without prior XML/DITA experience, there is a lot to learn. I would not recommend this to anyone who cannot get support from other teams.
  • We reached a limit with translation tools/people. Few small translation agencies can deal with XML. Moreover, it seemed that tools like Trados and SDLX have a hard time dealing with multi-level DTDs.
    As this article is being written, we have found someone who can deal with these issues. Solutions to translating documents based on multi-level DTDs include: 1) Not importing the DTDs, which means the translator needs to remember which element and attribute not to translate or 2) merging DTDs into a large single DTD that the tool can manage.
  • The biggest problem I met was to keeping tag usage consistent. Although the tags are very specific, I had a tendency, at first, to mix up some of them, such as the concept's context and the short description.
    This issue can be resolved by using company guidelines for using the tools. So far, I was the only writer on the project, but as trainers and support teams start editing content and metadata, having rules will become increasingly important.
    Part of the problem lies in the way elements are rendered by the stylesheets. I based my stylesheets on my first understanding of DITA elements, but it did not display elements esthetically in every context.
  • Including other people in the production cycle is also a challenge. As the project has reached the end of the first development cycle, other parties are introduced to creating and maintaining topics.
    There will be a need to implement a workflow process to make sure topics are not duplicated and that new and updated documents are reviewed before they are included in delivered documents. We plan to reinforce that workflow by using the server's ability to manage user access and document properties.

Conclusion

Structuring the project's information with DITA was easy. The fact that there was no legacy content probably made the whole process simpler. I was able to create multiple user tasks from smaller system tasks and to document the system while user scenarios were still being defined.

Using DITA allowed for a quick start. Having access to a DTD set and sample XSL transformations saved me a lot of time. However, there was definitely an adaptation period; time was necessary to learn how to write within the information structure, use the tools and be comfortable with the tag set.

The overhead needed to adapt was worth it for this project because of the necessity to build multiple training guides and the changing nature of user tasks over the development period. If no extensive reuse had been needed, it might not have been worth the time and effort.

The first, system implementation, part of the project was a success. The next part includes a bigger human factor as other content developers will need to be able to modify topics. We do not foresee particular challenges related to DITA regarding our workflow process development.

DITA and XML allowed for a lot of automation and reuse, and processes that have been defined will serve as a solid structural foundation for future projects. In fact, we are already using what we built in a new project.


Bibliography

[Ament, Kurt] Single Sourcing – Building Modular Documentation, William Andrew Publishing, 2003.

[Cooper, Alan, Reimann, Robert] About Face 2.0 The essential of interaction design, Wiley Publishing, 2003.

[Coverpage's technology report] http://xml.coverpages.org/dita.html#relatedTM.

[Day, D., Priestley, M., Schell, David A.] Introduction to the Darwin Information Typing Architecture – Toward portable technical information, http://www-106.ibm.com/developerworks/xml/library/x-dita1/.

[Duffy, Tommy] Build an XML-based Tree Control with JavaScript, DevX.com, http://www.devx.com/getHelpOn/Article/11874.

[Feldman, Susan] The cost of not finding information, KMWorld, Volume 13, March 2004.

[Hackos, JoAnn] Content Management for Dynamic Web Delivery, John Wiley & Sons, February 28, 2002.

[Priestley, M.] Scenario-based and model-driven information development with XML DITA, xml.coverpages.org/PriestleyACMSIGDOC-2003-DITA.pdf.

[Rockley, Ann] The impact of single sourcing and technology, Technical communication, Volume 48, Number 2, May 2001.

[The Center for Information - Development Management] Making a business case for single-sourcing, Best Practices, Volume 3, Number 2, April 2001.



DITA - The mechanics of a single sourcing project.

France Baril [Documentation Architect, Ixiasoft]
france.baril@ixiasoft.com