<?xml version="1.0" encoding="ASCII"?><?xml-stylesheet type="text/xsl" href="../../../mathml/pmathml.xsl"?><html xmlns="http://www.w3.org/1999/xhtml" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:space="preserve">
   <head>
      <meta http-equiv="Content-Type" content="text/html; utf-8"/>
      <title>Proceedings of Extreme Markup Languages&#174;</title>
      <link rel="stylesheet" href="../../../extreme-proceedings.css" type="text/css"/>
   </head>
   <body>
      <div id="head">
         <div class="inner">
            <img class="right" src="../../../icons/ExtremeNoDates.jpg"/>
            <h2>
               <i>Proceedings of Extreme Markup Languages<sup>&#174;</sup>
               </i>
            </h2>
         </div>
      </div>
      <div id="nav">
         <table width="100%" cellspacing="5">
            <tr height="29">
               <td class="button" width="20%" align="center">
                  <a title="Master Bibliography" href="../../../biblio.html">Master Bibliography</a>
               </td>
               <td class="button" width="20%" align="center">
                  <a title="Author Index" href="../../../authors.html">Author Index</a>
               </td>
               <td class="button" width="20%" align="center">
                  <a title="Topic Index" href="../../../topics.html">Topic Index</a>
               </td>
               <td class="button" width="20%" align="center">
                  <a title="Date Index" href="../../../dates.html">Date Index</a>
               </td>
               <td class="button" width="20%" align="center">
                  <a title="Proceedings Home" href="../../../index.html">Proceedings Home</a>
               </td>
            </tr>
         </table>
      </div>
      <div id="left1">
         <div class="inner">
            <h4>Converting into pattern-based schemas: a formal approach</h4>
            <address>Antonina Dattolo <br class="br"/>
               <a href="mailto:dattolo@unina.it" class="mailto">dattolo@unina.it</a>
            </address>
            <address>Angelo Di Iorio <br class="br"/>
               <a href="mailto:diiorio@cs.unibo.it" class="mailto">diiorio@cs.unibo.it</a>
            </address>
            <address>Silvia Duca <br class="br"/>
               <a href="mailto:ducas@cs.unibo.it" class="mailto">ducas@cs.unibo.it</a>
            </address>
            <address>Antonio Angelo Feliziani <br class="br"/>
               <a href="mailto:afelizia@cs.unibo.it" class="mailto">afelizia@cs.unibo.it</a>
            </address>
            <address>Fabio Vitali <br class="br"/>
               <a href="mailto:fabio@cs.unibo.it" class="mailto">fabio@cs.unibo.it</a>
            </address>
            <div class="abstract">
               <h4>Abstract</h4>
               <p class="first">A traditional distinction among markup languages is how descriptive or prescriptive they
        are. We identify six levels along the descriptive/prescriptive spectrum. Schemas at a
        specific level of descriptiveness that we call "Descriptive No Order" (DNO) specify a list
        of allowable elements, their number and requiredness, but do not impose any order upon them.
        We have defined a pattern-based model based on a set of named patterns, each of which is an
        object and its composition rule (content model), enough to write descriptive schemas for
        arbitrary documents. We show that any schema can be converted into a pattern-based one
        without loss of information at the DNO level (<i>invariant
        conversion</i>). We present a formal analysis of invariant conversions of arbitrary
        schemas as a demonstration of the correctness and completeness of our pattern model.
        Although all examples are given in DTD syntax, the results should apply equally to XSD,
        Relax NG, or other schema languages.</p>
            </div>
            <p class="keywords">
               <b style="font-size:85%">Keywords:</b> 
               <a href="../../../topics/Modeling.html">Modeling</a>
            </p>
            <div class="contents">
               <h4>Table of Contents</h4>
               <dl>
                  <dt>
                     <a href="#t1">Introduction</a>
                  </dt>
                  <dt>
                     <a href="#t2">Descriptive markup languages and patterns</a>
                  </dt>
                  <dl>
                     <dt>
                        <a href="#t2-1">Simplifying descriptive schemas</a>
                     </dt>
                     <dl>
                        <dt>
                           <a href="#t2-1-1">Alternatives</a>
                        </dt>
                        <dt>
                           <a href="#t2-1-2">Mixed content models</a>
                        </dt>
                     </dl>
                     <dt>
                        <a href="#t2-2">Patterns for descriptive document structures</a>
                     </dt>
                  </dl>
                  <dt>
                     <a href="#t3">Invariant conversion among descriptive schemas</a>
                  </dt>
                  <dl>
                     <dt>
                        <a href="#t3-1">Refining the notion of descriptiveness</a>
                     </dt>
                     <dt>
                        <a href="#t3-2">Introducing the notion of Invariant Conversion</a>
                     </dt>
                  </dl>
                  <dt>
                     <a href="#t4">Invariant conversion towards pattern-based documents: a formal analysis</a>
                  </dt>
                  <dl>
                     <dt>
                        <a href="#t4-1">Grammars and formal languages</a>
                     </dt>
                     <dt>
                        <a href="#t4-2">The general grammar G</a>
                     </dt>
                     <dt>
                        <a href="#t4-3">Our grammar P</a>
                     </dt>
                  </dl>
                  <dt>
                     <a href="#t5">Invariant conversion between DTDs</a>
                  </dt>
                  <dt>
                     <a href="#t6">Extending the notion of invariant conversion: XML-Schema and RelaxNG</a>
                  </dt>
                  <dt>
                     <a href="#t7">Conclusions</a>
                  </dt>
               </dl>
            </div>
            <div class="authorBio">
               <h4>Antonina Dattolo</h4>
               <p class="first">Antonina Dattolo is an research associate at the Department of Mathematics and
          Applications "R. Caccioppoli" at the University of Naples Federico II. He holds a Laurea
          degree in Computer Science from the University of Salerno and a Ph.D. in Applied
          Mathematics and Computer Science from the University of Naples Federico II. Her research
          interests include markup languages; concurrent architectures for distributed hypermedia
          models; and . software agents. She is the author of several papers on distributed
          hypermedia models.</p>
            </div>
            <div class="authorBio">
               <h4>Angelo Di Iorio</h4>
               <p class="first">Angelo Di Iorio holds a Laurea degree and a PhD in Computer Science from the
          University of Bologna. His research interests include content management systems, web
          technologies, markup languages and digital publishing.</p>
            </div>
            <div class="authorBio">
               <h4>Silvia Duca</h4>
               <p class="first">Silvia Duca holds a Laurea degree in Computer Science from the University of Bologna.
          Her research interests include web-semantic, web technologies, ontologies and markup
          languages.</p>
            </div>
            <div class="authorBio">
               <h4>Antonio Angelo Feliziani</h4>
               <p class="first">Antonio Angelo Feliziani holds a Laurea degree in Computer Science from the University
          of Bologna. His research interests include document management systems, web technologies,
          markup languages and e-Learning. </p>
            </div>
            <div class="authorBio">
               <h4>Fabio Vitali</h4>
               <p class="first">Fabio Vitali is an associate professor at the Department of Computer Science at the
          University of Bologna. He holds a Laurea degree in Mathematics and a Ph.D. in Computer and
          Law, both from the University of Bologna. His research interests include markup languages;
          distributed, coordinated systems; and the World Wide Web. He is the author of several
          papers on hypertext functionalities, the World Wide Web, and XML.</p>
            </div>
         </div>
      </div>
      <div id="paperLinks">
         <table width="100%" cellspacing="5">
            <tr height="18">
               <td class="button" width="25%" align="center">
                  <a title="XML Source" href="../../../xml/2007/Dattolo01/EML2007Dattolo01.xml">XML&#160;Source</a>
               </td>
               <td class="button" width="25%" align="center">
                  <a title="PDF Version" href="../../../xslfo-pdf/2007/Dattolo01/EML2007Dattolo01.pdf">PDF&#160;(for&#160;print)</a>
               </td>
               <td class="nobutton" width="25%" align="center">
                  <span class="nolink">Author&#160;Package</span>
               </td>
               <td class="nobutton" width="25%" align="center">
                  <span class="nolink">Typeset&#160;PDF</span>
               </td>
            </tr>
         </table>
      </div>
      <div id="right1">
         <div class="inner">
            <div class="front">
               <h1 class="title">Converting into pattern-based schemas: a formal approach</h1>
               <address>Antonina Dattolo [Department of Mathematics and Applications R. Caccioppoli, 

    University of Napoli Federico II]</address>
               <address>Angelo Di Iorio [Department of Computer Science, University of Bologna]</address>
               <address>Silvia Duca [Department of Computer Science, University of Bologna]</address>
               <address>Antonio Angelo Feliziani [Department of Computer Science, University of Bologna]</address>
               <address>Fabio Vitali [Department of Computer Science, University of Bologna]</address>
               <h3 class="conference">Extreme Markup Languages 2007&#174; (Montr&#233;al, Qu&#233;bec)</h3>
               <h4>
                  <i>Copyright &#169; 2007 Antonina Dattolo, Angelo Di Iorio, Silvia Duca, Antonio Angelo Feliziani, and Fabio Vitali. Reproduced with permission.</i>
               </h4>
            </div>
            <div class="mathml-warning">
               <p>
                  <i>
                     <b>Note:</b>
                  </i> This paper contains <a href="http://www.w3.org/Math/">W3C MathML</a>,
          which is not equally well supported in all browsers. If you have reason to think 
          that mathematical expressions are not displaying properly, consult the 
          <a href="../../../xslfo-pdf/2007/Dattolo01/EML2007Dattolo01.pdf">PDF version</a> (or try a different browser).</p>
            </div>
            <div class="section">
               <h2>
                  <a name="t1"/>Introduction</h2>
               <p> Given any set of XML document, there is an open number of schemas that validate all
        members of the set, and reject all non-members. Not only schema languages allow some
        linguistic variety in expressing the same constraints, but also different constraints can be
        devised that actually accept and reject the same instances. </p>
               <p> Of course, when dealing with XML documents, validation is only part of the story.
        Classification (i.e., the ability to associate meaning and procedures to each part of the
        document) is also a key aspect for XML applications, and many would say actually more
        important than validation. When classifying documents, the correct association of each
        element with its label is more important than clearly specifying the set of rejected
        documents. Thus not only different schemas can be associated to the same set of documents
        for validation purposes, but also the different emphasis on validation vs. classification
        can imply or require different features to be made available in the schemas themselves,
        further increase their total number. </p>
               <p> Given the different emphasis that can be placed in operating with XML documents,
        schemas to deal with the same set of documents can be written with different perspectives in
        order to look for different things. Design patterns can be applied in the design of such
        schemas that can alleviate the problem of determining the important and the not-so-important
        things to check in an XML document. </p>
               <p> In particular, in the past we have fruitfully applied design patterns to descriptive
        schemas, whenever it is more important to describe what is in a document than to check for
        violation to structural rules. This paper has grown out of a question raised at Extreme
        2005, after the presentation of our work on a pattern-based approach for descriptive schemas
          <b>
                     <span style="font-size:85%">
                        <a href="#DIGV05" name="fromDIGV05">[DIGV05]</a>
                     </span>
                  </b>. On that occasion, we presented a very small set of patterns
        meant to design any descriptive schema for arbitrary documents. During the question-time, we
        were recommended to further investigate the properties of such model in a formal way.</p>
               <p>The goal of this work is then to demonstrate the completeness of those patterns, as a
        means for <i>descriptive validation</i> (validation against
        descriptive schemas) of document classes. We want to show that any schema can be
        automatically converted into an 'equivalent' one (w.r.t. descriptiveness), which is
        exclusively based on our patterns, and that such schema can accept the same class. </p>
               <p> This of course requires both explaining what concept of "equivalence" we have adopted
        and which is the actual meaning of such conversion. Actually we first need to clarify what
        we mean for 'descriptive schemas' and 'descriptive validation' and what we mean these
        descriptive schemas for. In many ways, this work starts off the traditional dichotomy
        between prescriptive and descriptive markup languages <b>
                     <span style="font-size:85%">
                        <a href="#Ren00" name="fromRen00">[Ren00]</a>
                     </span>
                  </b>
        
                  <b>
                     <span style="font-size:85%">
                        <a href="#Qui96" name="fromQui96">[Qui96]</a>
                     </span>
                  </b>. We note that many prescriptive constructs are used in descriptive
        contexts (and viceversa), and that these approaches are very often mixed with each other.
        This may happen for different reasons: the fact that some schemas are widely known and users
        prefer to adapt and reuse them rather than re-building new ones (still, not a bad choice
        considering that a lot of benefits derive from such choice), as well as the fact that
        designers receive only partial information about the domain they are modeling and the exact
        kind of validation that will be concretely needed. Last but not least, it is undeniable that
        constraining choices available in validation languages are often too powerful, and easily
        lead us to over-design of schemas.</p>
               <p>In 2005 we started discussing 'descriptive schemas' and proposed some substantial
        simplification which can be made by considering the nature and purposes of documents being
        analyzed. These simplifications allow us to write simple and well-engineered schemas which
        better fit descriptive needs. This work further delineates and formalizes those initial
        ideas.</p>
               <p>To this end, we need to refine our notion of descriptiveness. A first result is the
        identification of some subclasses of descriptive approaches for markup languages (that we
        call here "levels of descriptiveness"). Each subclass corresponds to some common needs and
        preferences according to which designers can relax constraints, express information by
        adopting specific rules and patterns, and reformulate declarations. For instance, we
        indicate with DNA (Descriptive No Alternatives) those schemas where alternative choices are
        so irrelevant as to be omittable, with DNC (Descriptive No Cardinality) those schemas where
        rules over cardinality can be relaxed, and with UD (Undescriptive) those schemas that accept
        all content models.</p>
               <p> In particular, we identified the DNO level (Descriptive No Order) as a good solution
        for the 'descriptiveness' objectives we discussed in our previous paper: such schemas do not
        use alternatives, express cardinality of each element and relax constraints over their
        order. Moving off such classification, and after resuming the discussion about expressivity
        and scope of DNO schemas, we will present a reduction algorithm to generate a DNO schema
        from any schema that is invariant with respect to the descriptiveness of the constraints.</p>
               <p>The analysis is divided in two main parts: a formal study of the conversion algorithm of
        XML DTDs, based on grammars and language theory, and an informal discussion about XML-Schema
        and RelaxNG schemas. Future version of this work will be applying formal techniques to these
        languages as well. The paper is then organized as follows: section 2 discusses descriptive
        schemas and constraints relaxation on that class of documents, section 3 refines our notion
        of descriptiveness and invariant conversion, sections 4 and 5 formalize grammars for XML
        DTDs and present our reduction algorithm; finally section 6 analyzes XML-Schema and RelaxNG
        in order to extend our approach to other languages too.</p>
            </div>
            <div class="section">
               <h2>
                  <a name="t2"/>Descriptive markup languages and patterns</h2>
               <div class="subsec1">
                  <h3>
                     <a name="t2-1"/>Simplifying descriptive schemas</h3>
                  <p>The analysis of descriptive markup languages is not new. Many researchers interrogate
          about the inherent nature and purposes of these languages, by comparing them with
          prescriptive approaches. For instance, <b>
                        <span style="font-size:85%">
                           <a href="#Qui96" name="fromQui96">[Qui96]</a>
                        </span>
                     </b> focused on DTDs: a
          prescriptive DTD may be designed to create new material or to mark up existing material,
          and prescribes a set of rules which all matching documents must follow; a descriptive DTD
          is used to create an electronic version of material that already exists (of course, a
          descriptive model may also be used to create new documents) and describes structures that
          exist, rather than forcing any particular structure. </p>
                  <p>A very good perspective to study these approaches and figure out application areas for
          each of them is investigating the concept of validation they differently implement. As
          outlined by Piez <b>
                        <span style="font-size:85%">
                           <a href="#Pie01" name="fromPie01">[Pie01]</a>
                        </span>
                     </b> two classes of validation can be identified,
          roughly corresponding to prescriptive and descriptive schemas: <i>strict</i> and <i>loose</i>. The traditional way
          of conceiving validation is &#8221;strict&#8221;, because validation is used as a &#8221;go/non-go&#8221; gauge to
          verify in advance whether or not a data set conforms to a set of requirements. The example
          provided by Piez explains very well the role of such a validation: the publishing process
          can be likened to an assembly line and validation is a control phase that prevents errors
          and makes the whole system work. When a document fails validation, there is something
          wrong with it, something that has to be changed in the document itself. Strict validation
          is useful (and sometimes necessary) as a means to split a complex job into sub-activities,
          that can be accomplished by different actors with different skills and facilities. </p>
                  <p> Even if less frequent, an opposite perspective is alike interesting: using validation
          to describe documents and to capture <i>a posteriori</i>
          structural information about a text. Piez defined such a process as a <i>loose</i> validation. It might be important to trace those features
          of the text important to the author or to encoder, rather than those constraints essential
          for subsequent operations over that text. Moreover it may happen that some features of
          documents are still undefined when designing schemas, as well as some instances (which
          should be considered valid) are still unknown. Thus, a descriptive schema is not something
          that exists before an instance, as a set of rules to be followed; in a sense, it derives
          from instances, as an <i>ex post facto</i> expression of what
          can be discovered from them. As a consequence, such a schema is not composed by
          fine-grained declarations that capture variations and exceptions, but it is composed by
          generic rules that capture the overall meaning of a set of documents. </p>
                  <p> A clear identification of contexts of use of schemas has (or, at least, should have)
          a great impact over their design. In particular, we have noticed how descriptive schemas
          can be simplified by carefully taking into account their real scope and objectives. Two
          examples, we also brought in 2005, help us in explaining such simplification process
          (check our previous paper for some more examples and a deeper analysis):</p>
                  <div class="subsec2">
                     <h4>
                        <a name="t2-1-1"/>Alternatives</h4>
                     <p>Let us consider a possible <i>either/or</i> situation:
            for instance, in an address, a document designer might decide that an address either has
            a P.O. Box or a street address. In a DTD like syntax, this could be rendered in a rule
            such as: </p>
                     <div class="figure">
                        <a name="exd1a"/>
                        <h5>Figure 1: Expressing alternatives in a DTD-like syntax</h5>
                        <pre>&lt;!ELEMENT address (name, (pobox | street), city, ZIP, state)&gt;</pre>
                     </div>
                     <p>In a prescriptive document factory, this rule effectively inhibits incorrect
            structures to be created, and ensures homogeneity in the created documents. In a
            descriptive environment, on the other hand, there is no homogeneity to be sought for
            documents (they exist already), but rather it is important that all existing documents
            are marked up at best and without ambiguities. </p>
                     <p>Now two things may happen: if in the document set there is no example of a
            simultaneous presence of P.O. Box and street address, then this is a constraint that has
            no practical effect on reality, one additional check that was not needed. If, on the
            other hand, a document exists that has both a street address and a P.O. Box, then the
            rule does not allow a correct markup, and forces the document editor to find a hack
            around the constraints of the DTD. </p>
                     <p>A corresponding descriptive rule would therefore be:</p>
                     <div class="figure">
                        <a name="exd1b"/>
                        <h5>Figure 2: Expressing alternatives with a descriptive rule</h5>
                        <pre>&lt;!ELEMENT address (name, pobox?, street?, city, ZIP, state)&gt;</pre>
                     </div>
                     <p>where the alternative has been transformed into a sequence of optional elements.
            This rule has no effect on the final markup, exposes exactly the same meanings for
            documents that naturally follow the stricter rule, <b>but</b> allows for the exception in case one exists. </p>
                     <p>Alternatives do not capture additional semantics with respect to a sequence of
            optional elements, but <i>a priori</i> exclude some
            situations to occur. Thus in a descriptive environment they are useless in the best
            cases (where all occurrences naturally follow the alternation) or a nuisance and an
            obstacle if an exception happens. </p>
                  </div>
                  <div class="subsec2">
                     <h4>
                        <a name="t2-1-2"/>Mixed content models</h4>
                     <p>Mixed content models are by definition used when describing semi-structured text
            flows that are part of larger contexts. Paragraphs that have meaningful subparts inside
            are natural candidates for mixed content models. </p>
                     <p>Each individual sub-element of a paragraph specifies some special meaning or style
            on the wrapped text. For this reason, all text elements within a sub-element of a
            paragraph are also part of the paragraph. In this view, it makes little sense that
            sub-elements of a mixed content paragraph is allowed to contain data that is not part of
            the paragraph text flow, since this could be difficult to identify without precise
            advance knowledge the meaning of the sub-element itself and its further subparts. </p>
                     <p>Thus the only reasonable forms of mixed content models should be: </p>
                     <div class="figure">
                        <a name="exd2a"/>
                        <h5>Figure 3: Defining a mixed-content model to model paragraphs</h5>
                        <pre>&lt;!ENTITY % inline "(#PCDATA | a | b | ... |
              z)*"&gt; &lt;!ELEMENT para %inline;&gt; &lt;!ELEMENT a
              %inline;&gt; &lt;!ELEMENT b %inline;&gt; ... &lt;!ELEMENT z
              %inline;&gt;</pre>
                     </div>
                     <p>or, at most, if we want to exclude further nesting inside sub-elements, </p>
                     <div class="figure">
                        <a name="exd2b"/>
                        <h5>Figure 4: Defining a mixed-content model to model paragraphs, excluding further nesting
              inside sub-elements</h5>
                        <pre>&lt;!ENTITY % inline "(#PCDATA | a | b | ... |
              z)*"&gt; &lt;!ELEMENT para %inline;&gt; &lt;!ELEMENT a
              (#PCDATA)&gt; &lt;!ELEMENT b (#PCDATA)&gt; ... &lt;!ELEMENT z
              (#PCDATA)&gt;</pre>
                     </div>
                     <p>This for is meant to specify that the content model of all elements of a mixed
            content are mixed content themselves (or simple text in the simplest cases), and that a
            block element is the only mixed content element whose content model list does not
            include itself (i.e., there is no para inside the inline entity). </p>
                  </div>
               </div>
               <div class="subsec1">
                  <h3>
                     <a name="t2-2"/>Patterns for descriptive document structures</h3>
                  <p>The main topic of the paper we presented at Extreme 2005 was a set of patterns that
          are enough for writing any descriptive schemas. We provided there a quite exhaustive
          description of our patterns by focusing on their features, applications and mutual
          relationships. We do not want to repeat here that analysis, but we need to focus on some
          features relevant for conversion (and formal proof) we explain later. Table 1 shows our
          patterns, defined by a very small set of objects and composition rules: </p>
                  <div class="table">
                     <h5>Table 1: Patterns</h5>
                     <table summary="Patterns" border="1" width="85%">
                        <colgroup>
                           <col/>
                           <col/>
                        </colgroup>
                        <tbody>

              

                           <tr>
                              <td align="center">

                  
                                 <b>Pattern</b>

                
                              </td>
                              <td align="center">

                  
                                 <b>DTD syntax</b>

                
                              </td>
                           </tr>

              

                           <tr>
                              <td align="center">Marker</td>
                              <td align="center"> &lt;!ELEMENT X EMPTY&gt;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Atom</td>
                              <td align="center">&lt;!ELEMENT X (#PCDATA)&gt;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Block</td>
                              <td align="center"> &lt;!ELEMENT X (#PCDATA | E1 | ... | En | M1 | ... | Mn |
                  Ax)*&gt; &#xD; </td>
                           </tr>

              

                           <tr>
                              <td align="center">Inline</td>
                              <td align="center">&lt;!ELEMENT E1 (#PCDATA | E1 | ... | En | M1 | ... | Mn |
                  Ax)*&gt; &#xD;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Record</td>
                              <td align="center">&lt;!ELEMENT X (E1?, E2?, ..., En?)&gt;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Container</td>
                              <td align="center">&lt;!ELEMENT X (E1 | E2 | ... | En)*&gt;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Table</td>
                              <td align="center">&lt;!ELEMENT X (E)*&gt;</td>
                           </tr>

            
                        </tbody>
                     </table>
                  </div>
                  <p>The first thing to notice is the introduction of a new pattern, called <b>container</b>, used to model all those circumstances where
          diversified objects are repeated and collected together. The name emphasizes the
          genericity of this pattern: a container, in fact, is an unordered set of repeatable and
          heterogeneous elements:</p>
                  <div class="figure">
                     <a name="figXX"/>
                     <h5>Figure 5: The container pattern</h5>
                     <pre>&lt;!ENTITY % Blocks "( para | table | li )" &gt; &lt;!ELEMENT body
            %Blocks; &gt; &lt;!ELEMENT section %Blocks; &gt; ... </pre>
                  </div>
                  <p> As expected, the content-model of a container includes markers, atoms, blocks,
          records, tables and containers themselves. Only raw text and inlines are excluded, because
          they should be wrapped within a block. Containers are related to both records and tables
          of our model. On the one hand, they share the classes of elements included in a record
          (blocks, inlines, records, etc.): what changes is only the repeatability of those
          elements, but the order is not relevant in both cases. On the other hand, they are related
          to tables for their repeatability. The only difference is that items of a container are
          heterogeneous, while those within a table are homogeneous. We use two separated patterns
          just to emphasize the difference between homogeneous and heterogeneous structures.
          Patterns in fact are meant to clearly distinguish the structural role of the objects.</p>
                  <p>Actually we are discussing about the specialization of the pattern <b>container</b> into a <b>hierarchical
          container</b>, which distinguished the title from its actual content (unordered
          set of repeatable elements). However, this difference is not relevant here: what is
          important, is the presence of an object able to gather and collect heterogeneous elements
          under the same logical wrapper.</p>
                  <p> Wrappers play a very important role in our model. They are specific elements used to
          gather uniform information, to group elements, and to emphasize relationships among them.
          Basically they spread the information over the depth of the document in order to decrease
          the need for complex constructs, and make explicit mutual connection among elements. For
          instance, every time a content model contains a mixed presence of repeated elements and
          single ones (or alternatives), a new (wrapper) element can be created to better model that
          scenario. It will substitute that &#8217;wrong&#8217; declaration fragment, inheriting the content
          model. Consider for instance an element declaration <tt class="code">&lt;!ELEMENT X
            (A,(B|C))&gt;</tt>. A new element W (<tt class="code">&lt;!ELEMENT W
          (B|C)&gt;</tt>) can be introduced and substituted in previous declaration, in order
          to obtain a homogeneous declaration <tt class="code">&lt;!ELEMENT X (A,W)&gt;</tt>. In
          turn, the declaration (<tt class="code">&lt;!ELEMENT W (B|C)&gt;</tt>) can be changed into
            (<tt class="code">&lt;!ELEMENT W (B?, C?)&gt;</tt>), when alternatives are actually
          irrelevant (i.e., descriptive scenarios). Note that the new definition does not impact the
          expressivity of the schema: in particular, it does not change its descriptiveness since it
          correctly generalizes the meaning and structure of the valid documents. </p>
                  <p> Another point is worth being remarked about patterns: specific rules are imposed over
          the class of objects allowed in the content-model of each of them. For instance, an inline
          element can be contained only within a block, a container cannot directly contain plain
          text, a record or a table cannot be contained in a block, and so on. Table 2 shows these
          constraints (each row indicates elements allowed in the content-model of each pattern). </p>
                  <div class="table">
                     <h5>Table 2: Composition rules over patterns</h5>
                     <table summary="Composition rules over patterns" border="1" width="85%">
                        <colgroup>
                           <col/>
                           <col/>
                           <col/>
                           <col/>
                           <col/>
                           <col/>
                           <col/>
                           <col/>
                           <col/>
                           <col/>
                        </colgroup>
                        <tbody>

              

                           <tr>
                              <td align="center">&#160;</td>
                              <td align="center">EMPTY</td>
                              <td align="center">Text</td>
                              <td align="center">Marker</td>
                              <td align="center">Atom</td>
                              <td align="center">Block</td>
                              <td align="center">Inline</td>
                              <td align="center">Record</td>
                              <td align="center">Container</td>
                              <td align="center">Table</td>
                           </tr>

              

                           <tr>
                              <td align="center">Marker</td>
                              <td align="center">X</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Atom</td>
                              <td align="center">&#160;</td>
                              <td align="center">X</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Block</td>
                              <td align="center">&#160;</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">&#160;</td>
                              <td align="center">X</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Inline</td>
                              <td align="center">&#160;</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">&#160;</td>
                              <td align="center">X</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Record</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">&#160;</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                           </tr>

              

                           <tr>
                              <td align="center">Container</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">&#160;</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                           </tr>

              

                           <tr>
                              <td align="center">Table</td>
                              <td align="center">&#160;</td>
                              <td align="center">&#160;</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">&#160;</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                              <td align="center">X</td>
                           </tr>

            
                        </tbody>
                     </table>
                  </div>
                  <p>Although it seems a limitation, such strictness contributes to widen the
          expressiveness and the applicability of patterns. By limiting the possible choices, the
          role played by each pattern is highly specialized and it is possible to associate a single
          pattern to the users&#8217; needs. For instance, preventing records within blocks we prevent an
          uncontrolled mixing of structured and unstructured content, or preventing inlines out of
          blocks we prevent incorrect locations for text fragments, or preventing tables within
          blocks we ensure the distinction between block texts and complex data structures, and so
          on.</p>
                  <p> Wrappers can be use to "by-pass&#8221; all those situations where a constraint among
          patterns is violated. Consider for instance, a container element declared as
            <tt class="code">&lt;!ELEMENT C (A|B)*&gt;</tt>, when B is an inline. A new block
          element, the wrapper W (<tt class="code">&lt;!ELEMENT W (#PCDATA|B)*&gt;</tt>) can be
          created and the C definition can be changed in <tt class="code">&lt;!ELEMENT C
          (A|W)*&gt;</tt>. All changes introduced by wrappers are then targeted to "clean&#8221; (or
          homogenize) documents structures. </p>
               </div>
            </div>
            <div class="section">
               <h2>
                  <a name="t3"/>Invariant conversion among descriptive schemas</h2>
               <div class="subsec1">
                  <h3>
                     <a name="t3-1"/>Refining the notion of descriptiveness</h3>
                  <p> The choice between descriptive and prescriptive models primarily depends on the
          relation between the process of actual writing a document and the process of encoding it.
          
          Descriptive models reflect the fact that authors have worked before the creation of the
          schema. Then, designers are in charge of accommodating variations, exceptions and
          generalize the features of a (possibly large) set of documents. Still, such generalization
          is not a pre-defined and univocal process. Designers in fact may want to relax some
          constraints, omit or reformulate some definitions, express information in a different way
          according to different criteria and preferences. What we presented in the previous section
          is only one of the possible solutions towards descriptiveness. </p>
                  <p>A deeper analysis leads us to identify 6 different subclasses of descriptive
          approaches, which we summarize in the following paragraphs. We call them <b>levels of descriptiveness</b>. Note also that our definition,
          although refer to DTDs, can be applied to any other validation language. What matter are
          the objectives and principles behind each paradigm: <table border="0" cellpadding="8" class="deflist">
                        <tr>
                           <td valign="top">

                
                              <b>Prescriptive (PRE)</b>

              
                           </td>
                           <td valign="top">
                              <p class="first">a prescriptive DTD imposes a set of rules which all matching documents must
                  follow. Prevent errors in a production chain, based on strict validation.</p>
                           </td>
                        </tr>
                        <tr>
                           <td valign="top">

                
                              <b>Descriptive No Alternatives (DNA)</b>

              
                           </td>
                           <td valign="top">
                              <p class="first">a descriptive DTD without alternatives do not allow users to force a choice
                  between two (or more elements). The basic idea is that alternatives are meant to
                  inhibit incorrect structures, but they are not required when all documents already
                  exist and the DTD is used to describe all those documents (including variations
                  and exceptions otherwise unpredicted by a strict/prescriptive DTD).</p>
                           </td>
                        </tr>
                        <tr>
                           <td valign="top">

                
                              <b>Descriptive no cardinality (DNC)</b>

              
                           </td>
                           <td valign="top">
                              <p class="first">a descriptive DTD without alternative can be further generalized by relaxing
                  constraints over the cardinality of each single element. The idea is that by
                  forcing cardinalities some documents could be considered invalid, even if they
                  belong to the same class. Such validation is not meant to prevent errors, but to
                  describe existing resources.</p>
                           </td>
                        </tr>
                        <tr>
                           <td valign="top">

                
                              <b>Descriptive No Order (DNO)</b>

              
                           </td>
                           <td valign="top">
                              <p class="first">constraints over the order can be relaxed as well. Imposing an order is
                  something extremely useful when invalid documents obstruct a complex process, but
                  it makes much less sense when the goal is identifying subcomponents. A descriptive
                  document is not meant to say where each object is located (a presentation layer
                  can change that property), but which are the objects contained in the document
                  itself.</p>
                           </td>
                        </tr>
                        <tr>
                           <td valign="top">

                
                              <b>Super Descriptive (SD)</b>

              
                           </td>
                           <td valign="top">
                              <p class="first">relaxing both constraints over cardinality and order, besides alternatives,
                  designers can create abstract DTDa which consider any object as a sequence of
                  repeatable and optional elements (as in the example). Apparently vague, these DTDs
                  are meant to only define the set of objects of the documents.</p>
                           </td>
                        </tr>
                        <tr>
                           <td valign="top">

                
                              <b>(Un)Descriptive (UD)</b>

              
                           </td>
                           <td valign="top">
                              <p class="first">relaxing any constraint designers could say that anything includes anything.
                  Not useful in practice, those DTDs are only mentioned to complete our spectrum.</p>
                           </td>
                        </tr>
                     </table>
        
                  </p>
                  <p> Table 3 shows a very simple DTD declaration, transformed according to all these
          models. </p>
                  <div class="table">
                     <h5>Table 3: Descriptiveness levels example</h5>
                     <table summary="Descriptiveness levels example" border="1" width="85%">
                        <colgroup>
                           <col/>
                           <col/>
                        </colgroup>
                        <tbody>

              

                           <tr>
                              <td align="center">

                  
                                 <b>Descriptiveness level</b>

                
                              </td>
                              <td align="center">

                  
                                 <b>Content Model</b>

                
                              </td>
                           </tr>

              

                           <tr>
                              <td align="center">Prescriptive (PRE)</td>
                              <td align="center">&lt;!ELEMENT X (A, (B | C), D*)&gt;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Descriptive No Alternatives (DNA)</td>
                              <td align="center">&lt;ELEMENT X (A, (B?, C?), D*)&gt;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Descriptive No Cardinality (DNC)</td>
                              <td align="center">&lt;ELEMENT X (A*, (B*, C*), D*)&gt;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Descriptive No Order (DNO)</td>
                              <td align="center">&lt;ELEMENT X (A &amp; (B? &amp; C?) &amp;
                  D*)&gt;</td>
                           </tr>

              

                           <tr>
                              <td align="center">Super Descriptive (SD)</td>
                              <td align="center">&lt;ELEMENT X (A | (B | C) | D)* &gt;</td>
                           </tr>

              

                           <tr>
                              <td align="center">(Un)Descriptive (UD)</td>
                              <td align="center">Any</td>
                           </tr>

            
                        </tbody>
                     </table>
                  </div>
               </div>
               <div class="subsec1">
                  <h3>
                     <a name="t3-2"/>Introducing the notion of Invariant Conversion</h3>
                  <p> Our previous analysis, based on the paper we presented in 2005 <b>
                        <span style="font-size:85%">
                           <a href="#DIGV05" name="fromDIGV05">[DIGV05]</a>
                        </span>
                     </b>, implicitly identified the DNO paradigm (with some important
          variations) as a good solution to design generic schemas for document structures. Although
          we did not explicitly mention DNO, in fact, we studied situations where such
          descriptiveness (relaxing alternatives and order, but maintaining cardinality) is enough
          to express everything users need. In the same paper, we proposed some patterns and
          concluded that most of those descriptive situations can be modelled by adopting these and
          only these patterns. The goal of this work is proving that such pattern-based approach is
          complete according to a given notion of equivalence.</p>
                  <p> Our notion of equivalence is strictly related to the idea of 'levels of
          descriptiveness'. In particular, we exploit the strong separation between competencies and
          requirements implemented at each level. As stated before, in fact, choosing a level of
          descriptiveness depends on the context of use. Each level implies a set of constraints
          which can be relaxed, a set of properties which are interesting to be maintained, and a
          set of simplifications which can be made without loosing expressivity. </p>
                  <p> Another point is very important about these levels: their partial sorting. The same
          table we used to present them reflects that feature: levels embody increasing grades of
          generalization or, in reverse order, decreasing levels of prescriptiveness. The more
          constraints and impositions are relaxed, the more a higher level of the hierarchy is
          involved. Thus, starting from the lowest PRE level (where concrete and specific rules
          impose strong constraints over documents), our classification goes up to DNA (which
          relaxes constraints over alternatives) which, in turn, goes up to DNC and DNO. In a sense,
          they can be considered at the same 'meta-level' since they both relax constraints on a
          single dimension by keeping unchanged the other one. By merging these approaches, we
          obtain schemas at SD level, up to (Un)Descriptive (UD) one, which accepts any document.
          </p>
                  <p>On the basis of such considerations, the term <i>fully equivalent
            at a given level</i> indicates that the information carried by two schemas is
          exactly the same <i>under the constraints implied by that level of
            descriptiveness</i>. To excess, all schemas can be considered equivalent at the
          higher level (UD) since no constraint actually exists. Similarly, at DNO level no matter
          whether or not a schema expresses alternatives, as well as at DNC level it is not relevant
          to count the number of the elements. All these constraints are relevant at lower levels
          (down to the PRE level, where no rules can be relaxed) but can be neglected when dealing
          with schemas from a higher perspective.</p>
                  <p> We then define an <b>invariant schema conversion to a given
            level D</b> as a "<b>transformation from a whichever input
            schema A into a specific output schema B, which respects constraints and rules imposed
            by D level and preserves D descriptiveness of schema A in schema B</b>". </p>
                  <p>An example of invariant conversion at DNO level is shown below:</p>
                  <p>

          
                     <tt class="code">&lt;!ELEMENT bibliography (info, (title, subtitle?)?, (mixedcontent | entries)))
            &gt;</tt>

        
                  </p>
                  <p>

          
                     <tt class="code">&lt;!ELEMENT bibliography (info &amp; title? &amp; subtitle? &amp;
            mixedcontent? &amp; entries?) &gt;</tt>

        
                  </p>
                  <p> Note that an invariant conversion makes sense only when associated to a specific
          level of descriptiveness. A converted schema does not express exactly the same information
          of the original one, rather the same information relevant for that context. Consider for
          instance the example in table 3: the schema at DNO level cannot be used to prevent the
          co-presence of the B and C elements (as the prescriptive one) but give readers an
          exhaustive description of the elements contained in X, if their position and co-existence
          are not relevant. </p>
                  <p> The above mentioned discussion about motivations and use-cases for using DNO schemas
          can now be applied to DNO invariant conversion. By converting schemas into DNO equivalent
          ones, we obtain instances which better fit descriptive scenarios. Our starting claim
          (about the completeness of our patterns according to a notion of equivalence) can be now
          reformulated: we want to prove that <b>for any schema there exists an
            invariant conversion at DNO level which transforms it into a schema exclusively based on
            our patterns</b>. We want to state that any schema can be generalized (and
          normalized) into a new one which 'descriptively validates' the same set of documents and
          only uses our patterns: the point is that all simplifications and reductions on the input
          schema do not impact constraints and rules imposed at DNO level. From now on, we will use
          the term 'invariant conversion' to indicate such class of reductions, implying they hold
          at DNO level. </p>
                  <p>We applied invariant conversions to some elements' declarations of well-known markup
          languages, in order to produce their pattern-based 'relatives'. Once again, the point is
          not raising objections on those original declarations, but showing how they are simplified
          in a DNO context. Consider for instance the <tt class="code">glosslist</tt> element of DocBook
          (version 4.5) whose content-model is shown in the first line of fig. <a href="#figexample1">6</a>:</p>
                  <div class="figure">
                     <a name="figexample1"/>
                     <h5>Figure 6: The <tt class="code">glosslist</tt> declaration in DocBook 4.5 and its pattern-based
            reformulation</h5>
                     <pre>&lt;!ELEMENT glosslist (blockinfo?,
            (title,titleabbrev?)?, glossentry+)&gt; &lt;!ELEMENT glosslist (blockinfo?,
            title?, titleabbrev?, glossentry+)&gt; &lt;!ELEMENT glosslist (blockinfo?,
            title?, titleabbrev?, glossentries?)&gt; &lt;!ELEMENT glossentries
            (glossentry+)&gt; </pre>
                  </div>
                  <p>The operator '?' after the sequence <tt class="code">title</tt> and <tt class="code">titleabbrev</tt>,
          which is in turn optional, sounds quite unnatural: why do designers need to repeat it
          twice on the <tt class="code">titleabbrev</tt> element? The reason is that a
          <tt class="code">titleabbrev</tt> makes no sense without a title, and designers wanted to prevent
          the existence of an isolated <tt class="code">titleabbrev</tt>. Although legitimate in a
          prescriptive environment, such a declaration can be substituted by a more general one in a
          descriptive scenario. The second declaration in fig. <a href="#figexample1">6</a>
          validates also a glossary with only an abbreviated title but captures the same information
          at DNO level, that is the existence of title and abbreviated title for a glossary.</p>
                  <p>The presence of a sequence (of elements <tt class="code">glossentry</tt>) mixed with
          non-repeated elements (<tt class="code">bookinfo</tt> and titles) also conflicts with our
          pattern-based vision. However, it is evident that the <tt class="code">glossentry</tt> elements are
          logically contained in a wrapper, even if that wrapper were not explicitly used in the
          document itself. The introduction of a virtual element, say <tt class="code">glossentries</tt>,
          makes explicit and processable the uniformity among <tt class="code">glossentry</tt>s and
          distinguishes them from the rest of the <tt class="code">glosslist</tt> content-model. By paying
          some extra verbosity, the resulting schema clearly highlights the overall structure of a
          list of entries in a glossary. </p>
                  <p>The case of <tt class="code">glosslist</tt> is not isolated. Similar considerations can be
          repeated about any DTD (or better, any schema written in any language), whenever
          declarations are designed for prescriptive purposes and generalized into descriptive ones.
          A similar example can also be found in the Extreme Markup Conference DTD, by looking at
          the element <tt class="code">deflist</tt>, which encodes a two-column list of definitions. The
          corresponding declaration, shown in fig. <a href="#figexample2">7</a>, can be transformed
          into a pattern-based one by adding a virtual wrapper for the <tt class="code">item</tt> elements
          (which plays the same role of the previous <tt class="code">glossentries</tt>) and by relaxing
          constraints over the co-presence of a term name and explanation. Although an explanation
          without a term makes a little sense, and should be prevented when creating new documents,
          the (second and descriptive) declaration in fig. <a href="#figexample2">7</a>, which
          considers an heading as a record of both optional elements <tt class="code">term.heading</tt> and
            <tt class="code">def.heading</tt>, equally expresses the basic information of a definition list.</p>
                  <div class="figure">
                     <a name="figexample2"/>
                     <h5>Figure 7: The <tt class="code">deflist</tt> declaration in the Extreme Markup DTD and its
            pattern-based reformulation</h5>
                     <pre>&lt;!ELEMENT deflist (title?, (term.heading,
            def.heading?)?, def.item+)&gt; &lt;!ELEMENT deflist (title?,term.heading?,
            def.heading?, def.items?)&gt; &lt;!ELEMENT def.items (def.item+)&gt; </pre>
                  </div>
                  <p>Prescriptive constraints can also be relaxed on the TEILite Specifications, keeping
          invariant declarations at DNO level. Consider the element <tt class="code">publicationStmt</tt>,
          which groups information about the publication and distribution of an encoded text. Its
          content model, shown in fig. <a href="#figexample3">8</a>, is meant to define two
          different forms of statement, which cannot be mixed each other. A
          <tt class="code">publicationStmt</tt> is a choice between two sequences of some common information
          (captured by the <tt class="code">%m.Incl;</tt> entity) preceded by paragraphs or other
          publication-related elements. The complex structure of this content-model derives from the
          need of validating homogeneous sequences, which exclusively start with paragraphs
            <i>or</i> different elements. A descriptive declaration
          does not need to prevent such co-existence but aims at indicating which elements appear
          and how many times. By also transforming alternatives into records of optional elements,
          according to the theory discussed in our previous paper, this declaration could even be
          converted in the very general schema shown in fig. <a href="#figexample3">8</a>. It could
          be further improved by introducing wrappers, which make explicit the relationship among
          uniform and repeatable elements. </p>
                  <div class="figure">
                     <a name="figexample3"/>
                     <h5>Figure 8: The <tt class="code">publicationStmt</tt> declaration in TEILite and its pattern-based
            reformulation</h5>
                     <pre>&lt;!ELEMENT publicationStmt ( ( p, (%m.Incl;)*)+ | (
            (publisher | distributor | authority | pubPlace | address| idno | availability | date ),
            (%m.Incl;)*)+ )&gt;</pre>
                     <pre>&lt;!ELEMENT publicationStmt ( ( p?, publisher?,
            distributor?, authority?, pubPlace?, address?,idno?, availability?, date?),
            (%m.Incl;)*)+ </pre>
                  </div>
               </div>
            </div>
            <div class="section">
               <h2>
                  <a name="t4"/>Invariant conversion towards pattern-based documents: a formal analysis</h2>
               <div class="subsec1">
                  <h3>
                     <a name="t4-1"/>Grammars and formal languages</h3>
                  <p>In order to deeply analyze patterns we performed a formal analysis, based on language
          theory. In formal language theory, a language is defined as a set of words built over a
          set of terminal symbols, and grammars define rules to combine together terminals through
          productions. Our idea is to derive properties of validation languages (whether
          pattern-based or not) by analyzing the grammars which produce these languages.</p>
                  <p>We will present a formal study of invariant conversions for DTDs. We will show that,
          for any DTD, it exists an invariant conversion (once again, at DNO level) which transforms
          it into a pattern-based one.</p>
                  <p> We chose DTDs because they are simpler, easier to be read, and more direct. The
          reverse of the medal is the fact that DTDs are less expressive than other schema
          languages, and many schemas (for instance, all those written in XML-Schema <b>
                        <span style="font-size:85%">
                           <a href="#TBMM01" name="fromTBMM01">[TBMM01]</a>
                        </span>
                     </b> or RelaxNG <b>
                        <span style="font-size:85%">
                           <a href="#CM01" name="fromCM01">[CM01]</a>
                        </span>
                     </b>) seem to be cut off from our
          discussion. </p>
                  <p>In this connection, we will later discuss differences between DTDs, XML-Schema and
          RelaxNG in terms of descriptiveness and invariant conversion. We will conclude that
          advanced features of XML-Schema and RelaxNG do not impact the descriptiveness of a schema,
          and invariant conversion is valid for those languages as well. A proof on DTDs is then
          meant to prove the completeness and soundness of a general conversion approach,
          independent from actual schema languages.</p>
                  <p>An empirical and quite surprising result is also interesting on this point: <b>
                        <span style="font-size:85%">
                           <a href="#BMNS05" name="fromBMNS05">[BMNS05]</a>
                        </span>
                     </b> discovered that the 85% of the existing XML-Schemas are equivalent to
          DTDs. Although they took random schemas which are probably not representative for all
          schemas currently used (one of the reason, for instance, is the fact that many tools
          automatically generate fairly elaborate schemas) they pointed out a very interesting
          trend, i.e. an undeniable strength of DTDs in the everyday life. Even narrowing the
          analysis on the only DTDs, would be a partial but interesting result.</p>
                  <p>Our proof consists of three main phases: (i) introducing a grammar which produces all
          the possible DTDs, (ii) introducing a grammar which produce all the DTDs based on our
          patterns (described in our previous work <b>
                        <span style="font-size:85%">
                           <a href="#DIGV05" name="fromDIGV05">[DIGV05]</a>
                        </span>
                     </b>) and (iii) presenting a
          reduction algorithm which applies an invariant conversion on each production of the
          grammars. In the following subsections, we introduce the general grammar G for DTDs and we
          propose our grammar P able to produce all the DTDs which use only our patterns, postponing
          the comparison between related languages to section 5.</p>
               </div>
               <div class="subsec1">
                  <h3>
                     <a name="t4-2"/>The general grammar G</h3>
                  <p>The general grammar G, provided by the W3C <b>
                        <span style="font-size:85%">
                           <a href="#BPSMM00" name="fromBPSMM00">[BPSMM00]</a>
                        </span>
                     </b>, produces all
          the possible DTDs. To make it easier, we extracted some rules and worked only on them. In
          particular, we are interested in the element type declarations (summarized in figure <a href="#fig1">9</a>) since they define the overall structure of a document.</p>
                  <div class="figure">
                     <a name="fig1"/>
                     <h5>Figure 9: General grammar G</h5>
                     <pre>[45] elementdecl ::= '&lt;!ELEMENT' S Name S
            contentspec S? '&gt;' [46] contentspec ::= 'EMPTY' | 'ANY' | Mixed | children [51]
            Mixed ::= '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' | '(' S? '#PCDATA' S? ')' [47]
            children ::= ( choice | seq ) ( '?' | '*' | '+')? [48] cp ::= ( Name | choice | seq ) (
            '?' | '*' | '+')? [49] choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')' [50] seq ::= '(' S?
            cp ( S? ',' S? cp )* S? ')' </pre>
                  </div>
               </div>
               <div class="subsec1">
                  <h3>
                     <a name="t4-3"/>Our grammar P</h3>
                  <p>The grammar P aims at expressing in a formal way constraints and composition rules
          over pattern-based documents. The production rules of that grammar are summarized in
          figure <a href="#fig2">10</a>. Productions [p01-p08] are used to declare the seven
          different patterns, while the remaining ones are introduced to specify their content
          models.</p>
                  <div class="figure">
                     <a name="fig2"/>
                     <h5>Figure 10: Our pattern-based grammar P</h5>
                     <pre> [p01] elementdecl ::= markerelementdecl | atomelementdecl
            | blockelementdecl | inlineelementdecl | recordelementdecl | containerelementdecl |
            tableelementdecl [p02] markerelementdecl ::= '&lt;!ELEMENT' S MarkerName S
            markercontentspec S? '&gt;' [p03] atomelementdecl ::= '&lt;!ELEMENT' S AtomName
            S atomcontentspec S? '&gt;' [p04] blockelementdecl ::= '&lt;!ELEMENT' S
            BlockName S blockcontentspec S? '&gt;' [p05] inlineelementdecl ::=
            '&lt;!ELEMENT' S InlineName S inlinecontentspec S? '&gt;' [p06]
            recordelementdecl ::= '&lt;!ELEMENT' S RecordName S recordcontentspec S? '&gt;'
            [p07] containerelementdecl ::= '&lt;!ELEMENT' S ContainerName S containercontentspec
            S? '&gt;' [p08] tableelementdecl ::= '&lt;!ELEMENT' S TableName S
            tablecontentspec S? '&gt;' [p09] markercontentspec ::= 'EMPTY' [p10] atomcontentspec
            ::= '(' S? '#PCDATA' S?')' [p11] blockcontentspec ::= maicontentspec [p12]
            inlinecontentspec ::= maicontentspec [p13] maicontentspec ::= '(' S? '#PCDATA' (S? '|'
            S? maiName S?)+ S? ')*' [p14] recordcontentspec ::= '(' S? mabrctName '?'? ( S?
            &amp;' S? mabrctName '?'? S? )* S? ')' [p15] containercontentspec ::= '(' S?
            mabrctName ( S? '|' S? mabrctName S? )* S? ')*' [p16] tablecontentspec ::= '(' S?
            mabrctName S? ')*' [p17] maiName ::= MarkerName | AtomName | InlineName [p18] mabrctName
            ::= MarkerName | AtomName | BlockName | RecordName | ContainerName | TableName
          </pre>
                  </div>
                  <p>We perform some initial simplifications to make simpler and clearer the analysis: for
          instance, we omit attributes declarations, and we do not consider some unusual
          declarations as <tt class="code">(#PCDATA)*</tt> (that can be substituted with the equivalent
            <tt class="code">(#PCDATA)</tt>). Moreover, we do not consider the terminal symbol
          '<tt class="code">+</tt>' both for shortness and both because it could be associated to the
          terminal '<tt class="code">*</tt>' from a descriptive perspective.</p>
                  <p>Another point is worth being explained: we introduce the terminal symbol
            '<tt class="code">&amp;</tt>', that in SGML syntax means that all elements must occur in any
          order, in order to better formalize the DNO model.</p>
               </div>
            </div>
            <div class="section">
               <h2>
                  <a name="t5"/>Invariant conversion between DTDs</h2>
               <p> Our proof is concluded by a point-to-point comparison between the languages generated
        by the two grammars, G e P. The goal is showing the existence of an invariant conversion at
        DNO level, for any DTD generated by the grammar G into a DTD generated from P. This property
        can be formally split in two sub-propositions: </p>
               <p>

        
                  <b>Proposition 1</b>

        
                  <i>Let L(P) and L(G) be the languages generated respectively by our
          grammar P and the general grammar G. L(P) is the set of all possible pattern based DTDs,
          while L(G) is the set of all possible DTDs. <span style="font-family: 'Lucida Sans Unicode'">
                        <mml:math display="block" overflow="scroll">

            
                           <mml:mstyle id="x1-10002r1" class="label"/>

            
                           <mml:mi>L</mml:mi>

            
                           <mml:mrow>

              
                              <mml:mo class="MathClass-open">(</mml:mo>

              
                              <mml:mrow>

                
                                 <mml:mi>P</mml:mi>

              
                              </mml:mrow>

              
                              <mml:mo class="MathClass-close">)</mml:mo>

            
                           </mml:mrow>

            
                           <mml:mo class="MathClass-rel">&#8834;</mml:mo>

            
                           <mml:mi>L</mml:mi>

            
                           <mml:mrow>

              
                              <mml:mo class="MathClass-open">(</mml:mo>

              
                              <mml:mrow>

                
                                 <mml:mi>G</mml:mi>

              
                              </mml:mrow>

              
                              <mml:mo class="MathClass-close">)</mml:mo>

            
                           </mml:mrow>

          
                        </mml:math>
                     </span>
        
                  </i>

      
               </p>
               <p>
        
                  <b>Proposition 2</b>
        
                  <i>Let L(G) and L(P) be as above. <span style="font-family: 'Lucida Sans Unicode'">
                        <mml:math display="block" overflow="scroll">

            
                           <mml:mstyle id="x1-10004r2" class="label"/>

            

            
                           <mml:mo class="MathClass-op">&#8704;</mml:mo>

            
                           <mml:mi>d</mml:mi>

            
                           <mml:mo class="MathClass-rel">&#8712;</mml:mo>

            
                           <mml:mi>L</mml:mi>

            
                           <mml:mrow>

              
                              <mml:mo class="MathClass-open">(</mml:mo>

              
                              <mml:mrow>

                
                                 <mml:mi>G</mml:mi>

              
                              </mml:mrow>

              
                              <mml:mo class="MathClass-close">)</mml:mo>

            
                           </mml:mrow>

            
                           <mml:mspace class="nbsp"/>

            
                           <mml:mo class="MathClass-op">&#8707;</mml:mo>

            
                           <mml:mi>p</mml:mi>

            
                           <mml:mo class="MathClass-rel">&#8712;</mml:mo>

            
                           <mml:mi>L</mml:mi>

            
                           <mml:mrow>

              
                              <mml:mo class="MathClass-open">(</mml:mo>

              
                              <mml:mrow>

                
                                 <mml:mi>P</mml:mi>

              
                              </mml:mrow>

              
                              <mml:mo class="MathClass-close">)</mml:mo>

            
                           </mml:mrow>

            
                           <mml:mspace class="nbsp"/>

            
                           <mml:mspace class="nbsp"/>
            
          
                        </mml:math>
                     </span> | for some reduction algorithm r: L(G) &#8594; L(P), <span style="font-family: 'Lucida Sans Unicode'">
                        <mml:math overflow="scroll">

            
                           <mml:mi>d</mml:mi>

            
                           <mml:mover>

              
                              <mml:mo>&#8594;</mml:mo>

              
                              <mml:mtext>r</mml:mtext>

            
                           </mml:mover>

            
                           <mml:mi>p</mml:mi>

          
                        </mml:math>
                     </span> and p and d are equally descriptive at DNO level. </i> The symbol
          <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math display="inline" overflow="scroll">

          
                        <mml:mover>

            
                           <mml:mo>&#8594;</mml:mo>

            
                           <mml:mtext>r</mml:mtext>

          
                        </mml:mover>

        
                     </mml:math>
                  </span> indicates that <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mi>d</mml:mi>

        
                     </mml:math>
                  </span> is reduced to <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mi>p</mml:mi>

        
                     </mml:math>
                  </span> applying the function <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mi>r.</mml:mi>

        
                     </mml:math>
                  </span>
      
               </p>
               <p>
                  <i> Proof. </i> We want to demonstrate the existence, for
        any DTD, producible from G, of a pattern-based DTD, producible from P, which is equally
        descriptive at DNO level. To do it, we propose a constructive demonstration: we present a
        reduction algorithm, which applied to a DTD, generates a pattern-based DTD, equally
        descriptive at DNO level. </p>
               <p> The algorithm for the transformation of a generic DTD in a pattern-based DTD consists
        of two phases, called <i>element types detection</i> and
          <i>refinement</i>: the first phase shows how each element X,
        generated by G, can be mapped in an element X' generated by P, and, in this way, associated
        at one of the seven possible patterns. The demonstration analyzes, in exhaustive way, the
        derivations generated by productions [46-51] of grammar G and matches them with equivalent
        (for derivations) productions of grammar P [01-18]; the second phase performs a cross check
        on identified patterns in order to assure that all pattern constraints (see Table 2) are
        respected; any violation is corrected applying a set of four reduction rules.</p>
               <p>
                  <i>Element types detection</i>. Now we apply an exhaustive
        analysis, deriving the language generated by grammar G and identifying the correspondence
        with productions and language generated by grammar P: </p>
               <p> by applying productions [45] and successively [46], if we derive in G contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span>'EMPTY', equivalently we apply productions [p01], [p02] and [p09] in P and we
        derive markercontentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> 'EMPTY'. In this case the element X is associated to a <b>Marker</b>. </p>
               <p> By applying productions [45], [46] and [51], if we derive contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> Mixed <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> (#PCDATA), equivalently we apply productions [p01], [p03] and [p10] in P and we
        derive atomcontentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> (#PCDATA). In this case the element X is associated to an <b>Atom</b>. </p>
               <p> By applying productions [45], [46] and [51], if we derive contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> Mixed <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> (#PCDATA | Name | ... | Name)*, then let n = Name | ... | Name, or in more
        general form n = N<sub>1</sub> | ... | N<sub>m</sub>, with m &#8805; 0. If n = &#216;, then
        contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span>
                  <sup>*</sup> (#PCDATA)* . Relaxing constraints, we reduce (#PCDATA)*
          <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math display="inline" overflow="scroll">

          
                        <mml:mover>

            
                           <mml:mo>&#8594;</mml:mo>

            
                           <mml:mtext>r</mml:mtext>

          
                        </mml:mover>

        
                     </mml:math>
                  </span> (#PCDATA) (case treated before). If n <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8800; </mml:mo>

        
                     </mml:math>
                  </span> &#216;, then contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> Mixed <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> (#PCDATA | N<sub>1</sub> | ... | N<sub>m</sub>)* . In this situation, if <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8707;</mml:mo>

        
                     </mml:math>
                  </span> i <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8712;</mml:mo>

        
                     </mml:math>
                  </span> {1, ..., m} <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8715;</mml:mo>

        
                     </mml:math>
                  </span>' X = N<sub>i</sub> (that is, X includes it self in its content model), X is
        associated to an <b>Inline</b>; while if n <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8800; </mml:mo>

        
                     </mml:math>
                  </span> &#216; and <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8708; </mml:mo>

        
                     </mml:math>
                  </span> i <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8715;' </mml:mo>

        
                     </mml:math>
                  </span>X = N<sub>i</sub>, with i = 1, ..., m, X is associated to a <b>Block</b>. In the first case, we apply in grammar P productions
        p[01], [p05], [p12], [p13] and [p17], while in the second case productions [p01], [p04],
        [p11], [p13] and [p17]. Since productions [11], [12] and [17] require that <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel"> &#8704;</mml:mo>

        
                     </mml:math>
                  </span> i, i=1, ..., m, N<sub>i</sub>
        
                  <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8712;</mml:mo>

        
                     </mml:math>
                  </span> MAI = {MarkerName, AtomName, InlineName}, if this constraint is not respected,
        in the refinement phase appropriate reduction rules must be applied. </p>
               <p> By applying productions [45], [46], [47] and [49], if we derive contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> children <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> choice <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> (cp | cp | ... | cp), we relax constraints and we associated to X a <b>Record</b>, reducing (cp | cp | ... | cp) <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math display="inline" overflow="scroll">

          
                        <mml:mover>

            
                           <mml:mo>&#8594;</mml:mo>

            
                           <mml:mtext>r</mml:mtext>

          
                        </mml:mover>

        
                     </mml:math>
                  </span> (cp? &amp; cp? &amp; ... &amp; cp?). In this case, the equivalent
        productions in grammar P are [p01], [p06], [p14] and [p18]. The constraint is <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8704;</mml:mo>

        
                     </mml:math>
                  </span>cp, cp <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8712; </mml:mo>

        
                     </mml:math>
                  </span>MABRCT = {MarkerName, AtomName, BlockName, RecordName, ContainerName, TableName}.
        As before, if this constraint is violated, appropriate reduction rules must be applied in
        the refinement phase. </p>
               <p> By applying productions [45], [46], [47] and [49], if we derive: contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> children <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> choice* <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> (cp | cp | ... | cp)* X is associated to a <b>Container</b> (applying equivalent productions [p01], [p07], [p15] and [p18]. In
        this case, if the constraint <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8704;</mml:mo>

        
                     </mml:math>
                  </span>cp, cp <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8712;</mml:mo>

        
                     </mml:math>
                  </span> MABRCT = {MarkerName, AtomName, BlockName, RecordName, ContainerName,
        TableName}, is violated, in refinement phase, appropriate reduction rules must be applied. </p>
               <p> By applying productions [45], [46], [47] and [49], if we derive contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> children <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> choice+, then we relax constraints, reducing choice+ <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math display="inline" overflow="scroll">

          
                        <mml:mover>

            
                           <mml:mo>&#8594;</mml:mo>

            
                           <mml:mtext>r</mml:mtext>

          
                        </mml:mover>

        
                     </mml:math>
                  </span> choice* (case treated before); if we derive contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> children <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> choice?, we relax constraints, reducing choice? <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math display="inline" overflow="scroll">

          
                        <mml:mover>

            
                           <mml:mo>&#8594;</mml:mo>

            
                           <mml:mtext>r</mml:mtext>

          
                        </mml:mover>

        
                     </mml:math>
                  </span> ( cp? &amp; cp? &amp; ... &amp; cp?) (case treated before).</p>
               <p> By applying productions [45], [46], [47] and [50], if we derive: contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> children <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> seq <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> (cp, cp, ..., cp), then we relax constraints, reducing (cp, cp, ..., cp) to (cp
        &amp; cp &amp; ... &amp; cp). In this situation X is associated to a <b>Record</b> (applying equivalent productions [p01], [p06], [p14] and
        p[18]). </p>
               <p> By applying productions [45], [46], [47] and [50], if we derive contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> children <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> seq* <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> (cp, cp, ..., cp)*, equivalently we apply productions [p01], [p08], [p16] and
        [p18] in grammar P and X is associated to a <b>Table</b> and a
        record wrapper is introduced for cp, ..., cp.</p>
               <p> Finally, by applying productions [46], [47] and [50], if we derive contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> children <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> seq? <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> (cp, ..., cp)?, we relax constraints, reducing (cp, ..., cp)? <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math display="inline" overflow="scroll">

          
                        <mml:mover>

            
                           <mml:mo>&#8594;</mml:mo>

            
                           <mml:mtext>r</mml:mtext>

          
                        </mml:mover>

        
                     </mml:math>
                  </span> (cp? &amp; ... &amp; cp?) (case treated before); while, if we derive
        contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> children <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> seq+ <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> (cp, ..., cp)+, then we relax constraints, reducing (cp, ..., cp)+ <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math display="inline" overflow="scroll">

          
                        <mml:mover>

            
                           <mml:mo>&#8594;</mml:mo>

            
                           <mml:mtext>r</mml:mtext>

          
                        </mml:mover>

        
                     </mml:math>
                  </span> (cp, ..., cp)* (case treated before).</p>
               <p> In previous analysis, we have we have left out the derivation, obtained by applying
        productions [45] and [46]: contentspec <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">

          
                        <mml:mo class="MathClass-rel">&#8658; </mml:mo>

        
                     </mml:math>
                  </span> 'ANY'; the reason is that, in practice, this approach is rarely used because it
        allows too much freedom, and therefore undermines the benefits that derive from defining
        document structures. Only for the sake of completeness, we mention it and we associate to it
        a Container wrapper, that can contain, directly or indirectly (by means other wrappers), any
        other element. </p>
               <p> We note that, during this phase, the presence of 'cp' claims, in many situations, the
        introduction of appropriate wrappers. </p>
               <p> The result of this phase is the reduction of each element to one (and only one) of the
        seven patterns, or, in alternative, is the introduction of a wrapper, which assumes the type
        of a given pattern. As consequence of this consideration, in order to complete our
        reduction, we need of checking that all elements respect inclusion constraints synthesized
        in Table 2. This is performed by next refinement phase. </p>
               <p>
                  <i>Refinement</i>. The second phase performs a cross check
        between elements to assure that all constraints are observed. Table 4 marks with an X only
        permitted combinations in elements' declarations, while marks with (x), x = 0, ..., 4, the
        situations in which a reduction rule must be applied. </p>
               <div class="table">
                  <h5>Table 4: Composition rules over patterns and reductions</h5>
                  <table summary="Composition rules over patterns and reductions" border="1" width="85%">
                     <colgroup>
                        <col/>
                        <col/>
                        <col/>
                        <col/>
                        <col/>
                        <col/>
                        <col/>
                        <col/>
                        <col/>
                        <col/>
                     </colgroup>
                     <tbody>

            

                        <tr>
                           <td align="center">&#160;</td>
                           <td align="center">EMPTY</td>
                           <td align="center">Text</td>
                           <td align="center">Marker</td>
                           <td align="center">Atom</td>
                           <td align="center">Block</td>
                           <td align="center">Inline</td>
                           <td align="center">Record</td>
                           <td align="center">Container</td>
                           <td align="center">Table</td>
                        </tr>

            

                        <tr>
                           <td align="center">Marker</td>
                           <td align="center">X</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                        </tr>

            

                        <tr>
                           <td align="center">Atom</td>
                           <td align="center">(0)</td>
                           <td align="center">X</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                        </tr>

            

                        <tr>
                           <td align="center">Block</td>
                           <td align="center">(0)</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">(1)</td>
                           <td align="center">X</td>
                           <td align="center">(3)</td>
                           <td align="center">(3)</td>
                           <td align="center">(3)</td>
                        </tr>

            

                        <tr>
                           <td align="center">Inline</td>
                           <td align="center">(0)</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">(1)</td>
                           <td align="center">X</td>
                           <td align="center">(4)</td>
                           <td align="center">(4)</td>
                           <td align="center">(4)</td>
                        </tr>

            

                        <tr>
                           <td align="center">Record</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">(2)</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                        </tr>

            

                        <tr>
                           <td align="center">Container</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">(2)</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                        </tr>

            

                        <tr>
                           <td align="center">Table</td>
                           <td align="center">(0)</td>
                           <td align="center">(0)</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">(2)</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                           <td align="center">X</td>
                        </tr>

          
                     </tbody>
                  </table>
               </div>
               <p> For each element declaration one must checked that all contained element in the content
        model don't violate the constraints expressed in Table 2. Every time an element content
        model brakes a rule (i.e. an inline declaration contains a block in its content model) a
        specified reduction rule has to be applied according to the previous table (in the previous
        example the (1) reduction rule has to be applied). Some times a reduction rule can change
        the type classification of an element implying a complete recheck of all element
        declarations, making this algorithm iterative. We note that, during reduction process, we
        relax some constraints, prescribed in grammar G; in this way, the set of documents, accepted
        by the pattern-based DTDs, generated by P, is at least large as the set of documents
        generated by the original DTD. In the following subsections each reduction rule will be
        specified. </p>
               <table border="0" cellpadding="8" class="deflist">
                  <tr>
                     <td valign="top">

            
                        <i>Reduction rule (0)</i>

          
                     </td>
                     <td valign="top">
                        <p class="first">All cases marked with this reduction rule will never happen because the definition
              of the element excludes those cases. For example if an element has been recognized as
                <i>Marker</i> it is impossible that the declaration of
              the element contains other elements than the keyword 'EMPTY'.</p>
                     </td>
                  </tr>
                  <tr>
                     <td valign="top">

            
                        <i>Reduction rule (1)</i>

          
                     </td>
                     <td valign="top">
                        <p class="first">In the case that a block element is found in the declaration of other blocks or
              inline then a two-steps reduction has to be applied: <ul>
                              <li>the block element that is found inside other element will be no longer
                    considered as a block but as an inline;</li>
                              <li>having changed the type classification of an element all the element
                    declarations have to be re-checked with the new type classification.</li>
                           </ul>
            
                        </p>
                     </td>
                  </tr>
                  <tr>
                     <td valign="top">

            
                        <i>Reduction rule (2)</i>

          
                     </td>
                     <td valign="top">
                        <p class="first">In the case that an inline element appears inside a record, a container or a table
              element one must created a new block element that will substitute the inline element
              in all the wrong positions; the content model of the new block element will have to
              contain only the inline element. In this case, a re-check is not needed.</p>
                     </td>
                  </tr>
                  <tr>
                     <td valign="top">

            
                        <i>Reduction rule (3)</i>

          
                     </td>
                     <td valign="top">
                        <p class="first">In the case that a record, a container or a table element appear in content model
              of a block element, a three steps reduction has to be applied: <ul>
                              <li>remove the <tt class="code">offending-element</tt> from the
                    <tt class="code">block-element</tt> containing it;</li>
                              <li>create a new container-wrapper with content model of type
                      <tt class="code">(block-element | offending-element)*</tt>;</li>
                              <li>substitute everywhere the <tt class="code">block-element</tt> with the new
                    container-wrapper.</li>
                           </ul> In this case it's not needed a re-check.</p>
                     </td>
                  </tr>
                  <tr>
                     <td valign="top">

            
                        <i>Reduction rule (4)</i>

          
                     </td>
                     <td valign="top">
                        <p class="first">In the case that a record, a container or a table element appear in content model
              of an inline element than a three steps reduction has to be applied: <ul>
                              <li>remove the <tt class="code">offending-element(s)</tt> from the
                    <tt class="code">inline-element</tt> containing it;</li>
                              <li>push the removed <tt class="code">offending-element(s)</tt> in all the parent block
                    elements that contain as son or descendant the <tt class="code">inline-element</tt>;</li>
                              <li>having modified the content model of some elements, all the element
                    declarations have to be re-checked. </li>
                           </ul>
            
                        </p>
                     </td>
                  </tr>
               </table>
               <p>This concludes the demonstration. <i>CVD.</i>
               </p>
            </div>
            <div class="section">
               <h2>
                  <a name="t6"/>Extending the notion of invariant conversion: XML-Schema and RelaxNG</h2>
               <p>DTDs are widely used among designers because of their simplicity and plainness.
        Surprisingly, although more powerful validation languages exist, most of the existing
        schemas are still equivalent to DTDs <b>
                     <span style="font-size:85%">
                        <a href="#BMNS05" name="fromBMNS05">[BMNS05]</a>
                     </span>
                  </b>. Then, an invariant
        conversion among DTDs is a partial but very interesting result of our research.</p>
               <p> The next step will be extending our approach to XML-Schema and RelaxNG. We plan to
        analyze them by exploiting language theory as well. A point is very important: our goal is
        not evaluating their expressiveness (many authoritative analysis exist, based on formal
          <b>
                     <span style="font-size:85%">
                        <a href="#MLM00" name="fromMLM00">[MLM00]</a>
                     </span>
                  </b> or informal <b>
                     <span style="font-size:85%">
                        <a href="#LDWCW00" name="fromLDWCW00">[LDWCW00]</a>
                     </span>
                  </b>
        
                  <b>
                     <span style="font-size:85%">
                        <a href="#JR01" name="fromJR01">[JR01]</a>
                     </span>
                  </b> approaches) but their "descriptiveness". In particular, we want to
        prove that the reduction algorithm proposed in the previous section can be alike applied to
        any schema written in XML-Schema and RelaxNG. </p>
               <p> We then need to verify how much each new feature of these languages impacts the
        descriptiveness. Note that XML-Schema and RelaxNG provide users much powerful
        functionalities, but most of them are not directly related to the matter of this paper, i.e.
        the (structured) content models of descriptive schemas. Then, the analysis can be focused
        only on some relevant aspects. </p>
               <p> In <b>
                     <span style="font-size:85%">
                        <a href="#LDWCW00" name="fromLDWCW00">[LDWCW00]</a>
                     </span>
                  </b> Lee and Chu presented a comparative analysis of six
        schema languages, including DTDs and XML-Schema. We investigated each dimension proposed by
        the authors, in relation to descriptiveness and our patterns, and extended their analysis to
        RelaxNG. This section sketches our results and discusses how both XML-Schema and RelaxNG do
        not extend the "descriptiveness" of DTDs. As a consequence, our pattern-based approach can
        be extended to these languages as well. </p>
               <p> Lee and Chu classified validation languages' features in 7 categories: </p>
               <p>

        
                  <b>Schema</b>

      
               </p>
               <p> The features related to the overall organization, modularization and syntax do not
        impact the descriptiveness, since they do not cover definitions of content-models. The fact
        that XML-Schema and RelaxNG documents are written in XML, the fact that they are
        namespace-aware and the fact that they provide powerful mechanisms to include/import
        external schemas does not extend the set of choices and constructs already available in DTDs
        content-models. </p>
               <p>

        
                  <b>Datatype</b>

      
               </p>
               <p> XML datatype can be classified in two types: simple and complex. Simple types (the only
        discussed in the "datatype" section of the original survey) do not impact structured and
        mixed content models, but allow designers to better define the text content of elements and
        attributes. Built-in types, user-defined types, modifiable facets give designers a more
        powerful control over atomic objects. In a descriptive scenario however they can all be
        generalized as strings (<tt class="code">(#PCDATA)</tt>). </p>
               <p>Note that RelaxNG provides a more flexible solution than XML-Schema, since it allows
        users to import and reuse any existing datatype set (and natively supports XML-Schema
        types). Even there customized and extensible datatype sets do not impact structured content
        models.</p>
               <p>

        
                  <b>Attribute</b>

      
               </p>
               <p> XML-Schema and RelaxNG empower the management/validation of attributes. In particular,
        both of them allow users to associate simple types to an attribute, and overcome
        CDATA-related limitations of the DTDs. Moreover, RelaxNG makes possible choices among
        attributes, some co-constraints over their content, and so on. Although the relation between
        content-models and attributes is very interesting (traditional discussion about
        attribute/element trade offs can be found in <b>
                     <span style="font-size:85%">
                        <a href="#CR02" name="fromCR02">[CR02]</a>
                     </span>
                  </b>), attributes are
        temporarily out of the scope of this research. They in fact do not directly change
        structured and mixed content models. </p>
               <p>

        
                  <b>Element</b>

      
               </p>
               <p> As expected, elements play a central role in our analysis. Lee and Chu identified seven
        aspects of the elements' management. Some of them are irrelevant for our discussion since
        they do not improve content-models expressiveness, like the presence of default values.
        Others were not discussed because supported by all the languages, like the presence of
        ordered sequence and choices among elements, or by none of them, like the open model
        (implemented for instance in Schematron <b>
                     <span style="font-size:85%">
                        <a href="#Jel05" name="fromJel05">[Jel05]</a>
                     </span>
                  </b>). Other dimensions are
        indeed very important and will be discussed below: <ol type="1">
                     <li>
              
                        <b>content-model</b>: all the schema languages we are
              analyzing support the same set of content-models: empty, text, element, or mixed. The
              mixed content model is particularly interesting since changes from DTDs to XML-Schema
              and RelaxNG (DTDs have only a type of mixed content model (<tt class="code">(#PCDATA | A |
              B)*</tt>), while Schema and RelaxNG give users more control over the position and
              mutual relations among in-lines). In a descriptive environment, however, designers do
              not need to express constraints and rules over the position of the in-line elements.
              Any specific mixed content-model can be then transformed into a general and unordered
              declaration. In RelaxNG things are simpler, since the text is a
                <tt class="code">&lt;rng:text&gt;</tt> element, which can be used as any other element. Even
              there any prescriptive declaration can be generalized in a XML DTD-like mixed content
              model. </li>
                     <li>
              
                        <b>unordered-sequence</b>: the SGML &amp; operator
              has been removed in XML-DTDs, for the sake of validation disambiguity. XML-Schema and
              RelaxNG restored that operator introducing <tt class="code">&lt;xs:all&gt;</tt> and
                <tt class="code">&lt;rng:interleave&gt;</tt>. Since our pattern-based model directly uses
              the unordered sequences (in the records), these operators do not raise issues.
              Actually, a remark is worth for RelaxNG: the operator
              <tt class="code">&lt;rng:interleave&gt;</tt> has a more powerful interleaving semantics.
              Consider for instance <tt class="code">A &amp; B*</tt>. Expressed in RelaxNG, it matches
              any interleaving of a sequence containing a single A element and a sequence containing
              zero or more B elements; it thus allows the A element to occur anywhere, including
              between two B elements. Being more general and validating a larger set of documents,
              such a declaration is in line with our descriptive intent. </li>
                     <li>
              
                        <b>min/max occurrences</b>: Both XML-Schema and RelaxNG
              allow designers to precisely define the minimum and maximum number of occurrences of
              any element. Such control, particularly useful in a prescriptive environment, is not
              needed in a descriptive one where users are interesting in <i>which</i> elements appear, more than <i>where and how
                many times</i>. Similar declarations can be then transformed into DTD-like
              declarations using the '<tt class="code">?</tt>' and '<tt class="code">*</tt>' operators. </li>
                  </ol>
      
               </p>
               <p>

        
                  <b>Inheritance</b>

      
               </p>
               <p> XML-Schema gives great importance to simple and complex types, and provides users many
        features from object-oriented programming. In particular, types can be derived by
        restriction and extension. On the contrary, RelaxNG does not provide explicit support for
        type derivation. Since simple datatyping is not relevant here, the derivation of simple
        types is not interesting as well. Some remarks are indeed useful about derivation over
        complex types. XML-Schema allows designers to extend complex types by adding elements and
        attributes only at the end of a content model: any extended type can be then handled as a
        sequence of elements or as any other content-model enriched with attributes. Both these
        situations have already been handled in our analysis and pattern-based approach. Derivation
        for restriction, on the contrary, does not impact descriptiveness since users can design
        loose definitions and ignore prescriptive restrictions. </p>
               <p>

        
                  <b>Being Unique Key</b>

      
               </p>
               <p> The simple ID/IDREF model of DTDs has been improved by XML-Schema and RelaxNG. While
        XML-Schema specifications directly provide powerful mechanisms to guarantee uniqueness of
        IDs, elements and attributes, RelaxNG moves issues related to the keys' uniqueness in a
        different standard (called RelaxNG Compatibility <b>
                     <span style="font-size:85%">
                        <a href="#CJMM01" name="fromCJMM01">[CJMM01]</a>
                     </span>
                  </b>). Identifiers do
        not impact content models, but the text content of specific attributes. As a consequence,
        they can be neglected in our analysis of LLC conversion. </p>
               <p>

        
                  <b>Miscellaneous</b>

      
               </p>
               <p> The residual class proposed by Lee and Chu addressed other aspects which do not impact
        descriptiveness and invariant conversions. First of all, they also discussed some features
        not relevant because they are unsupported either by XML-Schema (and DTDs) or by RelaxNG,
        like the dynamic constraints (which can be turned off and on, upon specific conditions) or
        elements/attributes versioning (which allows authors to define different values of the same
        object in different versions of the same schema). Second, they included aspects related to
        description and documentation, like the possibility of adding annotations and documentation,
        as well as embedding HTML code or producing self-describing specifications. All these
        features are variably supported by XML-Schema and RelaxNG but do not change the
        expressiveness of the elements' declarations. </p>
               <p> All these considerations are summarized in table 5, derived from the analysis of Lee
        and Chu. The table shows a fine-grained analysis of the features of DTDs, XML-Schema and
        RelaxNG and focused on their relation with descriptiveness and our pattern-based
        normalization. Future revisions of this work will provide much more details about each
        feature, and a formal analysis of its impact on invariant conversion. </p>
               <div class="table">
                  <h5>Table 5: DTDs, XML-Schema and RelaxNG features analysis</h5>
                  <table summary="DTDs, XML-Schema and RelaxNG features analysis" border="1" width="85%">
                     <colgroup>
                        <col/>
                        <col/>
                        <col/>
                        <col/>
                        <col/>
                        <col/>
                     </colgroup>
                     <tbody>

            

                        <tr>
                           <td>

                
                              <b>Features</b>

              
                           </td>
                           <td>

                
                              <b>DTD 1.0</b>

              
                           </td>
                           <td>

                
                              <b>XML Schema 1.0</b>

              
                           </td>
                           <td>

                
                              <b>RelaxNG</b>

              
                           </td>
                           <td>

                
                              <b>LLC XML-Schema</b>

              
                           </td>
                           <td>

                
                              <b>LLC RelaxNG</b>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td colspan="6" align="center">

                
                              <u>Schema</u>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td>syntax in xml</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td colspan="2">
                
                              <i>norelevance</i> Syntactical differences do not
                change expressiveness.</td>
                        </tr>

            

                        <tr>
                           <td>namespace</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td colspan="2">
                
                              <i>norelevance</i> Namespaces do not impact
                content-models</td>
                        </tr>

            

                        <tr>
                           <td>include</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td colspan="2">
                
                              <i>norelevance</i> Inclusions and imports do not impact
                content-models</td>
                        </tr>

            

                        <tr>
                           <td>import</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td colspan="2">
                
                              <i>norelevance</i> Inclusions and imports do not impact
                content-models</td>
                        </tr>

            

                        <tr>
                           <td colspan="6" align="center">

                
                              <u>Datatype</u>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td>built-in type</td>
                           <td>10</td>
                           <td>37</td>
                           <td>any</td>
                           <td rowspan="5">
                
                              <i>relevance</i> Simple types do not impact structured
                content-models, and can be reduced into a CDATA in a descriptive environment.</td>
                           <td rowspan="4">Valid for RelaxNG as well</td>
                        </tr>

            

                        <tr>
                           <td>user-defined type</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>Yes</td>
                        </tr>

            

                        <tr>
                           <td>domain constraint</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>Yes</td>
                        </tr>

            

                        <tr>
                           <td>null</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>Yes</td>
                        </tr>

            

                        <tr>
                           <td>extensibility</td>
                           <td>No</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>RELAX NG is not tied to a single set of datatypes. However new simple types do
                not change the structured content models.</td>
                        </tr>

            

                        <tr>
                           <td colspan="6" align="center">

                
                              <u>Attribute</u>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td>default value</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td rowspan="5">
                
                              <i>noanalysis</i> Our analysis does not cover
                attributes and their relations with content models.</td>
                           <td>Valid for RelaxNG as well</td>
                        </tr>

            

                        <tr>
                           <td>choice</td>
                           <td>No</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>Choice between attributes is an interesting extra-feature of RelaxNG, not
                relevant here.</td>
                        </tr>

            

                        <tr>
                           <td>optional vs. required</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td rowspan="3">Valid for RelaxNG as well</td>
                        </tr>

            

                        <tr>
                           <td>domain constraint</td>
                           <td>Partial</td>
                           <td>Yes</td>
                           <td>Yes</td>
                        </tr>

            

                        <tr>
                           <td>conditional definition</td>
                           <td>No</td>
                           <td>No</td>
                           <td>Partial</td>
                        </tr>

            

                        <tr>
                           <td colspan="6" align="center">

                
                              <u>Element</u>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td>default value</td>
                           <td>No</td>
                           <td>Partial</td>
                           <td>Partial</td>
                           <td colspan="2">
                
                              <i>norelevance</i> Default values allow designers to
                pre-define types and content-models, but do not change their expressiveness.</td>
                        </tr>

            

                        <tr>
                           <td>content model</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td>
                
                              <i>relevance</i> Both schema languages support empty,
                text, element, or mixed content models. DTDs have only a type of mixed content model
                  (<tt class="code">(#PCDATA | A | B)*</tt>), while XML-Schema provides a more powerful
                mechanism. In a descriptive environment, designers do not need to express
                constraints and rules over the position of the in-line elements. Any XML-Schema
                mixed content-model can be then transformed into a general DTD-like block or
                in-line.</td>
                           <td>Valid for RelaxNG as well. In RelaxNG <tt class="code">#PCDATA</tt> is a
                  <tt class="code">&lt;rng:text&gt;</tt> element and can appear in any position of a mixed
                content-model. The XML DTD-like declaration "generalizes" all these scenarios.</td>
                        </tr>

            

                        <tr>
                           <td>ordered sequence</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td colspan="2">

                
                              <i>nodifference</i>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td>unordered sequence</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td>
                
                              <i>relevance</i> Patterns use the SGML
                <tt class="code">&amp;</tt> operator, equivalent to the XML-Schema
                <tt class="code">&lt;xs:all&gt;</tt>. XML-Schema unordered sequences are then directly
                included in our model.</td>
                           <td>In RELAX NG, the corresponding operator (<tt class="code">&lt;rng:interleave&gt;</tt>)
                has a more powerful interleaving semantics. RelaxNG declarations are more general
                and validate a larger set of documents. No problem in terms of descriptiveness.</td>
                        </tr>

            

                        <tr>
                           <td>choice</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td colspan="2">
                
                              <i>nodifference</i> Alternatives are less interesting
                and useful in a descriptive scenario.</td>
                        </tr>

            

                        <tr>
                           <td>min &amp; max occurrence</td>
                           <td>Partial</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td>
                
                              <i>relevance</i> In a descriptive environment authors
                do not need to precisely specify how many times an element can appear within another
                one. XML-Schema declarations about occurrences can be then transformed into DTD-like
                declarations using the '?' and '*' operators.</td>
                           <td>Valid for RelaxNG as well.</td>
                        </tr>

            

                        <tr>
                           <td>open model</td>
                           <td>No</td>
                           <td>No</td>
                           <td>No</td>
                           <td colspan="2">

                
                              <i>nodifference</i>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td>conditional definition</td>
                           <td>No</td>
                           <td>No</td>
                           <td>Partial</td>
                           <td>

                
                              <i>nodifference </i>

              
                           </td>
                           <td>
                
                              <i>relevance</i> Co-constraints importance in a
                descriptive context is very low. More than preventing erroneous instances of
                documents, in fact, designers aims at describing the general structure of a large
                set of documents.</td>
                        </tr>

            

                        <tr>
                           <td colspan="6" align="center">

                
                              <u>Inheritance</u>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td>simple type by extension</td>
                           <td>No</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td rowspan="2" colspan="2">
                
                              <i>relevance</i> Simple types do not impact structured
                content-models, and can be reduced into a CDATA in a descriptive environment.</td>
                        </tr>

            

                        <tr>
                           <td>simple type by restriction</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>Yes</td>
                        </tr>

            

                        <tr>
                           <td>complex type by extension</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>No</td>
                           <td>
                
                              <i>relevance </i> Derived complex type adds elements at
                the end of a content-model and can be then handling as a stand-alone complex type.</td>
                           <td>

                
                              <i>nodifference</i>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td>complex type by restriction</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>No</td>
                           <td>
                
                              <i>relevance</i> Restriction is not relevant in a
                descriptive environment, where designers can use general and loose definitions</td>
                           <td>

                
                              <i>nodifference</i>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td colspan="6" align="center">

                
                              <u>Being unique or key</u>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td>uniqueness for attribute</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td>---</td>
                           <td rowspan="6">
                
                              <i>norelevance</i> Uniqueness of IDs, elements and
                attributes do not impact contet-models and documents' structures</td>
                           <td rowspan="6">Moreover RelaxNG decouples identity-constraints and specifications
                of document structures.</td>
                        </tr>

            

                        <tr>
                           <td>uniqueness for non-attribute</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>---</td>
                        </tr>

            

                        <tr>
                           <td>key for attribute</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>---</td>
                        </tr>

            

                        <tr>
                           <td>key for non-attribute</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>---</td>
                        </tr>

            

                        <tr>
                           <td>foreign key for attribute</td>
                           <td>Partial</td>
                           <td>Yes</td>
                           <td>---</td>
                        </tr>

            

                        <tr>
                           <td>foreign key for non-attribute</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>---</td>
                        </tr>

            

                        <tr>
                           <td colspan="6" align="center">

                
                              <u>Miscellaneous</u>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td>dynamic constraint</td>
                           <td>No</td>
                           <td>No</td>
                           <td>No</td>
                           <td colspan="2">

                
                              <i>nodifference</i>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td>version</td>
                           <td>No</td>
                           <td>No</td>
                           <td>No</td>
                           <td colspan="2">

                
                              <i>nodifference</i>

              
                           </td>
                        </tr>

            

                        <tr>
                           <td>documentation</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td colspan="2">
                
                              <i>norelevance</i> Extra information about
                declarations, elements and attributes do not impact content-models</td>
                        </tr>

            

                        <tr>
                           <td>embedded HTML</td>
                           <td>No</td>
                           <td>Yes</td>
                           <td>Yes</td>
                           <td colspan="2">
                
                              <i>norelevance</i> HTML embedding allows designers to
                describe and comment their schemas, but does not modify content models.</td>
                        </tr>

            

                        <tr>
                           <td>self-describability</td>
                           <td>No</td>
                           <td>Partial</td>
                           <td>No</td>
                           <td colspan="2">
                
                              <i>norelevance</i> Self-describability is useful for
                implementers, but does impact the expressiveness of content-models</td>
                        </tr>

          
                     </tbody>
                  </table>
               </div>
               <p>Legend: <dl>
                     <dd>
                        <i>norelevance</i> = does not change content-model
              expressiveness</dd>
                     <dd>
                        <i>relevance</i> = changes content-model
              expressiveness</dd>
                     <dd>
                        <i>nodifference</i> = supported (or unsupported) by
              both of the languages</dd>
                     <dd>
                        <i>noanalysis</i> = not yet analyzed in detail</dd>
                  </dl>
      
               </p>
            </div>
            <div class="section">
               <h2>
                  <a name="t7"/>Conclusions</h2>
               <p> This paper is rooted in the traditional conflict between prescriptive and descriptive
        markup languages. Our first goal was to further investigate descriptive schemas and identify
        some subclasses of that approach, in order to prove properties of a pattern-based model we
        presented at Extreme 2005 and we actually use in publishing applications.</p>
               <p>Although the paper has intentionally been focused on formal languages/schemas
        properties, in fact, we have also been working on a variety of tools which use internal
        formats based on our patterns. IsaWiki <b>
                     <span style="font-size:85%">
                        <a href="#DIV05" name="fromDIV05">[DIV05]</a>
                     </span>
                  </b>, for instance, is on open
        publishing environment aiming at simplyfying web editing processes and allowing users to
        customize (the content of) any web page; IsaLearning <b>
                     <span style="font-size:85%">
                        <a href="#DFMSV06" name="fromDFMSV06">[DFMSV06]</a>
                     </span>
                  </b> is a chain
        of authoring tools which allows users to easily create high-quality learning-objects from
        raw input files. </p>
               <p>The philosophy behind those applications is that a radical simplification of markup
        practice can facilitate creation of simple but expressive schemas and documents. In those
        contexts where data formats are meant to express information extracted by <i>a posteriori</i> analyses, in fact, the schemas for these formats
        could not be restrictive and express <i>a priori</i> rules of
        validation. On the contrary, users and applications take advantages from more general
        (descriptive) schemas which guarantee compatibility and 'tie' only a partial <i>but actually relevant</i> information. This work aims at
        consolidating our theory, by showing how to formalize and automate such a
        generalization/reduction process.</p>
               <p>To go back to our original subject, we plan to further work on formal properties of our
        model. Besides extending our grammar-based analysis to other validation languages, we also
        plan to investigate minimality and correctness of our patterns. From an implementation
        perspective, instead, we will work on actual converters able to automatically transform
        schemas, as well as editors for pattern-based documents.</p>
            </div>
            <hr class="hr"/>
            <h3>
               <i>Acknowledgments</i>
            </h3>
            <p class="first">We thank the components of our reseach team, Michael Sperberg-McQueen and the anonymous
        reviewers for their helpful comments. </p>
            <hr class="hr"/>
            <h3>
               <i>Bibliography</i>
            </h3>
            <p>
               <b>
                  <a name="BMNS05" href="#fromBMNS05">[BMNS05] </a>
               </b> G. J. Bex, W. Martens, F. Neven, and T. Schwentick, "Expressiveness of xsds: from
          practice to theory, there and back again" <i>In WWW ``05: Proceedings
            of the 14th international conference on World Wide Web</i>, New York, NY, USA,
          ACM Press, 2005. </p>
            <p>
               <b>
                  <a name="BPSMM00" href="#fromBPSMM00">[BPSMM00] </a>
               </b> T. Bray, J. Paoli, C. M. Sperberg-McQueen, and E. Maler., "Extensible Markup Language
          (XML) 1.0", <a href="http://www.w3.org/TR/REC-xml" target="_blank">http://www.w3.org/TR/REC-xml</a>, 2000. </p>
            <p>
               <b>
                  <a name="CJMM01" href="#fromCJMM01">[CJMM01] </a>
               </b> James Clark, and Makoto Murata, "RELAX NG DTD Compatibility", oasis-open.org,
            <a href="http://www.oasis-open.org/committees/relax-ng/compatibility.html" target="_blank">http://www.oasis-open.org/committees/relax-ng/compatibility.html</a>, 03 Dec.
          2001. </p>
            <p>
               <b>
                  <a name="CM01" href="#fromCM01">[CM01] </a>
               </b> James Clark, and Murata Makoto, "Relax NG",
          <a href="http://relaxng.org/spec-20011203.html" target="_blank">http://relaxng.org/spec-20011203.html</a>, 03 Dec 2001. </p>
            <p>
               <b>
                  <a name="CR02" href="#fromCR02">[CR02] </a>
               </b> Robin Cover, "SGML/XML: Using Elements and Attributes",
            <a href="http://xml.coverpages.org/elementsAndAttrs.html" target="_blank">http://xml.coverpages.org/elementsAndAttrs.html</a>, 26 Aug. 2002. </p>
            <p>
               <b>
                  <a name="DFMSV06" href="#fromDFMSV06">[DFMSV06] </a>
               </b> A. Di Iorio, A.A. Feliziani, S. Mirri, P. Salomoni and F. Vitali "Automatically
          Generating Accessible Learning Objects", <i>Journal on Educational
            Technology &amp; Society</i>, Volume 9, Issue 4, 2006. </p>
            <p>
               <b>
                  <a name="DIGV05" href="#fromDIGV05">[DIGV05] </a>
               </b> A. Di Iorio, D. Gubellini, and F. Vitali, "Design Patterns for Descriptive Document
          Substructures", <i>Extreme Markup Conference</i>, Montreal,
          Canada, 2005. </p>
            <p>
               <b>
                  <a name="DIV05" href="#fromDIV05">[DIV05] </a>
               </b> A. Di Iorio, Vitali, "From the Writable Web to the Global Editability", <i>Proceedings of ACM Hypertext '05</i>, ACM Press, Salzburg, Austria,
          2005, pp. 35-45. </p>
            <p>
               <b>
                  <a name="Jel05" href="#fromJel05">[Jel05] </a>
               </b> Rick Jelliffe, "Schematron 1.5", <a href="http://xml.ascc.net/schematron/" target="_blank">http://xml.ascc.net/schematron/</a>. </p>
            <p>
               <b>
                  <a name="JR01" href="#fromJR01">[JR01] </a>
               </b> Rick Jellife, "The W3C XML Schema Specification in Context", O'REILL xml.com,
            <a href="http://www.xml.com/pub/a/2001/01/10/schemasincontext.html" target="_blank">http://www.xml.com/pub/a/2001/01/10/schemasincontext.html</a>, 10 Jan. 2001. </p>
            <p>
               <b>
                  <a name="LDWCW00" href="#fromLDWCW00">[LDWCW00] </a>
               </b> Dongwon Lee, and W. Chu Whesley, "Comparative analysis of six XML schema languages",
            <i>SIGMOD Rec., ACM Press</i>,
            <a href="http://doi.acm.org/10.1145/362084.362140" target="_blank">http://doi.acm.org/10.1145/362084.362140</a>, New York, USA, 2000. </p>
            <p>
               <b>
                  <a name="MLM00" href="#fromMLM00">[MLM00] </a>
               </b> M.&#160;Murata, D.&#160;Lee, and M.&#160;Mani. Taxonomy of XML schema
          languages using formal language theory. Extreme Markup Languages, 2000. </p>
            <p>
               <b>
                  <a name="Pie01" href="#fromPie01">[Pie01] </a>
               </b> W. Piez, "Beyond the ``descriptive vs. procedural`` distinction", <i> The Extreme Markup Conference</i>, Montreal, Canada, 2001. </p>
            <p>
               <b>
                  <a name="Qui96" href="#fromQui96">[Qui96] </a>
               </b> L. Quin, "Suggestive Markup: Explicit Relationships in Descriptive and Prescriptive
          DTDs", <i>The SGML 96 Conference</i>, Boston, MA, USA, 1996. </p>
            <p>
               <b>
                  <a name="Ren00" href="#fromRen00">[Ren00] </a>
               </b> A. Renear, "The Descriptive/Procedural Distinction is Flawed", <i> Markup Languages: Theory and Practice</i>, 2000. </p>
            <p>
               <b>
                  <a name="TBMM01" href="#fromTBMM01">[TBMM01] </a>
               </b> Henry S. Thompson, David Beech, Murray Maloney, and Noah. Mendelsohn, <i>XML Schema Part 1: Structures</i>,
            <a href="http://www.w3.org/TR/xmlschema-1/" target="_blank">http://www.w3.org/TR/xmlschema-1/</a>, May 2001. </p>
            <hr class="hr"/>
            <hr class="hr"/>
            <p class="footertitle">Converting into pattern-based schemas: a formal approach</p>
            <address>Antonina Dattolo [Department of Mathematics and Applications R. Caccioppoli, 

    University of Napoli Federico II]<br class="br"/>
               <a href="mailto:dattolo@unina.it" class="mailto">dattolo@unina.it</a>
            </address>
            <address>Angelo Di Iorio [Department of Computer Science, University of Bologna]<br class="br"/>
               <a href="mailto:diiorio@cs.unibo.it" class="mailto">diiorio@cs.unibo.it</a>
            </address>
            <address>Silvia Duca [Department of Computer Science, University of Bologna]<br class="br"/>
               <a href="mailto:ducas@cs.unibo.it" class="mailto">ducas@cs.unibo.it</a>
            </address>
            <address>Antonio Angelo Feliziani [Department of Computer Science, University of Bologna]<br class="br"/>
               <a href="mailto:afelizia@cs.unibo.it" class="mailto">afelizia@cs.unibo.it</a>
            </address>
            <address>Fabio Vitali [Department of Computer Science, University of Bologna]<br class="br"/>
               <a href="mailto:fabio@cs.unibo.it" class="mailto">fabio@cs.unibo.it</a>
            </address>
            <hr class="hr"/>
         </div>
      </div>
   </body>
</html>
