<?xml version="1.0" encoding="ASCII"?><?xml-stylesheet type="text/xsl" href="../../../mathml/pmathml.xsl"?><html xmlns="http://www.w3.org/1999/xhtml" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:space="preserve">
   <head>
      <meta http-equiv="Content-Type" content="text/html; utf-8"/>
      <title>Proceedings of Extreme Markup Languages&#174;</title>
      <link rel="stylesheet" href="../../../extreme-proceedings.css" type="text/css"/>
   </head>
   <body>
      <div id="head">
         <div class="inner">
            <img class="right" src="../../../icons/ExtremeNoDates.jpg"/>
            <h2>
               <i>Proceedings of Extreme Markup Languages<sup>&#174;</sup>
               </i>
            </h2>
         </div>
      </div>
      <div id="nav">
         <table width="100%" cellspacing="5">
            <tr height="29">
               <td class="button" width="20%" align="center">
                  <a title="Master Bibliography" href="../../../biblio.html">Master Bibliography</a>
               </td>
               <td class="button" width="20%" align="center">
                  <a title="Author Index" href="../../../authors.html">Author Index</a>
               </td>
               <td class="button" width="20%" align="center">
                  <a title="Topic Index" href="../../../topics.html">Topic Index</a>
               </td>
               <td class="button" width="20%" align="center">
                  <a title="Date Index" href="../../../dates.html">Date Index</a>
               </td>
               <td class="button" width="20%" align="center">
                  <a title="Proceedings Home" href="../../../index.html">Proceedings Home</a>
               </td>
            </tr>
         </table>
      </div>
      <div id="left1">
         <div class="inner">
            <h4>On the Lossless Transformation of Single-File,
    Multi-Layer Annotations into Multi-Rooted Trees</h4>
            <address>Andreas Witt </address>
            <address>Oliver Schonefeld </address>
            <address>Georg Rehm </address>
            <address>Jonathan Khoo </address>
            <address>Kilian Evang </address>
            <div class="abstract">
               <h4>Abstract</h4>
               <p class="first">The Generalised Architecture for Sustainability (GENAU)
      provides a framework for the transformation of single-file,
      multi-layer annotations into multi-rooted trees. By employing
      constraints expressed in XCONCUR-CL, this procedure can be
      performed lossless, i.e., without losing information, especially
      with regard to the nesting of elements that belong to multiple
      annotation layers. This article describes how different types of
      linguistic corpora can be transformed using specialised tools,
      and how constraint rules can be applied to the resulting
      multi-rooted trees to add an additional level of
      validation.</p>
            </div>
            <p class="keywords">
               <b style="font-size:85%">Keywords:</b> 
               <a href="../../../topics/Validating.html">Validating</a>; <a href="../../../topics/Trees-Graphs.html">Trees/Graphs</a>; <a href="../../../topics/ConcurrentMarkup-Overlap.html">Concurrent Markup/Overlap</a>
            </p>
            <div class="contents">
               <h4>Table of Contents</h4>
               <dl>
                  <dt>
                     <a href="#t1">Introduction</a>
                  </dt>
                  <dt>
                     <a href="#sec-genau">Generalised Architecture for Sustainability (GENAU)</a>
                  </dt>
                  <dt>
                     <a href="#SingleRootedTrees">Transforming Single Rooted Trees</a>
                  </dt>
                  <dl>
                     <dt>
                        <a href="#levelerFlow">Leveler Pipeline</a>
                     </dt>
                     <dt>
                        <a href="#levelerLeveler">The Leveler Web Application</a>
                     </dt>
                     <dl>
                        <dt>
                           <a href="#levelerStep1">Leveler: Step 1</a>
                        </dt>
                        <dt>
                           <a href="#levelerStep2">Leveler: Step 2</a>
                        </dt>
                     </dl>
                     <dt>
                        <a href="#levelerCaveats">Considerations and Error Detection</a>
                     </dt>
                  </dl>
                  <dt>
                     <a href="#TransformingAnnotationGraphs">Transforming Annotation Graphs</a>
                  </dt>
                  <dt>
                     <a href="#sec-xconcur">XCONCUR</a>
                  </dt>
                  <dl>
                     <dt>
                        <a href="#sec-syntax">Document Syntax</a>
                     </dt>
                     <dt>
                        <a href="#sec-xconcur-cl">Validation</a>
                     </dt>
                     <dl>
                        <dt>
                           <a href="#sec-xconcur-cl-building-block">Basic constraint expressions</a>
                        </dt>
                        <dt>
                           <a href="#t5-2-2">Common derived operators</a>
                        </dt>
                        <dt>
                           <a href="#t5-2-3">Rule Evaluation</a>
                        </dt>
                        <dt>
                           <a href="#t5-2-4">Compact Syntax</a>
                        </dt>
                     </dl>
                  </dl>
                  <dt>
                     <a href="#t6">GENAU and XCONCUR</a>
                  </dt>
                  <dt>
                     <a href="#t7">Conclusion</a>
                  </dt>
               </dl>
            </div>
            <div class="authorBio">
               <h4>Andreas Witt</h4>
               <p class="first">
	  Andreas Witt received his PhD in Computational Linguistics
	  and Text Technology from the University of Bielefeld in
	  2002. After graduating in 1996, he started as a researcher
	  and instructor in Computational Linguistics and Text
	  Technology at Bielefeld University. He was heavily involved
	  in the establishment of the minor subject Text Technology in
	  Bielefeld University's Magister and B.A. program. In 2006 he
	  moved to University of T&#252;bingen, where he is engaged in a
	  project on Sustainability of Linguistic Resources. Witt's
	  main research interests deal with questions on the use and
	  limitations of markup languages for the linguistic
	  description of language data. He is a member of several
	  research organizations, amongst them the TEI Special
	  Interest Group on overlapping markup, for which he wrote
	  parts of the latest version of the chapter "Multiple
	  Hierarchies", which is included in TEI-Guidelines P5.
	</p>
            </div>
            <div class="authorBio">
               <h4>Oliver Schonefeld</h4>
               <p class="first">
	  Oliver Schonefeld has studied computer science at Bielefeld
	  University,  Germany until 2005. Since then he is working at
	  the department of for computational linguistics and "text
	  technology" in at Bielefeld University. Parts of this contribution
	  deal with aspects of his forthcoming PhD thesis.
	</p>
            </div>
            <div class="authorBio">
               <h4>Georg Rehm</h4>
               <p class="first">
	  Georg Rehm works in T&#252;bingen University's collaborative
	  research centre Linguistic Data Structures in a project that
	  develops the foundations for sustainable linguistic
	  resources. He holds a PhD in Applied and Computational
	  Linguistics and has been working with SGML and related
	  technologies in the context of Natural Language Processing 
	  (especially with regard to text and corpus analysis as well
	  as ontologies) since 1995.
	</p>
            </div>
            <div class="authorBio">
               <h4>Jonathan Khoo</h4>
               <p class="first">
	  Jonathan Khoo is a Masters student in the International Studies in
	  Computational Linguistics program at University of
	  T&#252;bingen. He received his B.A. in Linguistics from
	  Northwestern University in 1998. Between his studies, he
	  worked as a web developer focusing on browser-based
	  replacements for traditional rich-client applications. He is
	  currently writing his M.A. thesis on one aspect of this
	  paper.
	</p>
            </div>
            <div class="authorBio">
               <h4>Kilian Evang</h4>
               <p class="first">
	  Kilian Evang is a B.A. student of Computational Linguistics at
	  the University of T&#252;bingen. Furthermore, he works on
	  sustainable linguistic resources.
	</p>
            </div>
         </div>
      </div>
      <div id="paperLinks">
         <table width="100%" cellspacing="5">
            <tr height="18">
               <td class="button" width="25%" align="center">
                  <a title="XML Source" href="../../../xml/2007/Witt01/EML2007Witt01.xml">XML&#160;Source</a>
               </td>
               <td class="button" width="25%" align="center">
                  <a title="PDF Version" href="../../../xslfo-pdf/2007/Witt01/EML2007Witt01.pdf">PDF&#160;(for&#160;print)</a>
               </td>
               <td class="nobutton" width="25%" align="center">
                  <span class="nolink">Author&#160;Package</span>
               </td>
               <td class="nobutton" width="25%" align="center">
                  <span class="nolink">Typeset&#160;PDF</span>
               </td>
            </tr>
         </table>
      </div>
      <div id="right1">
         <div class="inner">
            <div class="front">
               <h1 class="title">On the Lossless Transformation of Single-File,
    Multi-Layer Annotations into Multi-Rooted Trees</h1>
               <address>Andreas Witt [University of T&#252;bingen]</address>
               <address>Oliver Schonefeld [University of Bielefeld]</address>
               <address>Georg Rehm [University of T&#252;bingen]</address>
               <address>Jonathan Khoo [University of T&#252;bingen]</address>
               <address>Kilian Evang [University of T&#252;bingen]</address>
               <h3 class="conference">Extreme Markup Languages 2007&#174; (Montr&#233;al, Qu&#233;bec)</h3>
               <h4>
                  <i>Copyright &#169; 2007 Andreas Witt, Oliver Schonefeld, Georg Rehm, Jonathan Khoo, and Kilian Evang. Reproduced with permission.</i>
               </h4>
            </div>
            <div class="mathml-warning">
               <p>
                  <i>
                     <b>Note:</b>
                  </i> This paper contains <a href="http://www.w3.org/Math/">W3C MathML</a>,
          which is not equally well supported in all browsers. If you have reason to think 
          that mathematical expressions are not displaying properly, consult the 
          <a href="../../../xslfo-pdf/2007/Witt01/EML2007Witt01.pdf">PDF version</a> (or try a different browser).</p>
            </div>
            <div class="section">
               <h2>
                  <a name="t1"/>Introduction</h2>
               <p>Due to the complexity of this type of textual data, the annotation
        of linguistic corpora can be seen as a benchmark test for markup
        languages. In recent years<sup>
                     <span class="highlight">
                        <a href="#tod0e82" name="fromd0e82">1</a>
                     </span>
                  </sup>,
        linguistic analyses became more and more detailed and linguists
        incorporated the results of these analyses into corpora. Furthermore,
        linguistic descriptions applied to texts are extremely
        heterogeneous. This is especially evident if the research belongs to
        different fields of linguistics, e.g., syntax, semantics, and
        phonology.</p>
               <p>This contribution touches upon several different topics with
        regard to using XML-based markup languages for linguistic
        research. Our overall goal is the sustainable archiving of linguistic
        data. Corpora usually contain multiple annotation layers (morphology,
        part-of-speech, syntax, semantics, information structure, etc.). We
        devised a generalised architecture that, among other aspects, requires
        individual conceptual annotation layers contained in linguistic
        corpora to be separated from one another. To meet this prerequisite,
        we have to transform a linguistic corpus (normally represented by a
        single XML file, i.e., a single-rooted tree) into several XML files
        (i.e., a multi-rooted tree) so that each file contains one specific
        annotation layer. After a short introduction to the Generalised
        Architecture for Sustainability of linguistic data (GENAU;
	section <a href="#sec-genau">2</a>), 
        sections <a href="#SingleRootedTrees">3</a> and
	<a href="#TransformingAnnotationGraphs">4</a> describe two
	tools for the purpose of transforming 
        different types of linguistic corpora into multi-rooted trees. While
        the separation of individual annotation layers can be considered an
        important and necessary step for the sustainable archiving of
        linguistic corpora, this process does remove potentially important
        element nesting information. For a transformation from single- to
        multi-rooted trees that is 100% lossless, we need to incorporate a
        mechanism that enables us to store information with regard to element
        nesting. The approach we introduce in this article is based on
        XCONCUR, which we briefly describe in section <a href="#sec-xconcur">5</a>.</p>
               <p>This contribution reports on work in progress within the project
        "Sustainability of Linguistic Data", funded by the German Research
        Foundation (DFG). It is an update to our Extreme Markup Languages 2006
        Late Breaking paper <b>
                     <span style="font-size:85%">
                        <a href="#Woerner2006" name="fromWoerner2006">[W&#246;rner et al. (2006)]</a>
                     </span>
                  </b>
      
               </p>
            </div>
            <div class="section">
               <h2>
                  <a name="sec-genau"/>Generalised Architecture for Sustainability (GENAU)</h2>
               <p>Since the late 1990s, practically all annotation formats for
        linguistic corpora have been realised as XML-based markup languages
        (see <b>
                     <span style="font-size:85%">
                        <a href="#p3" name="fromp3">[Sperberg-McQueen &amp; Burnard (1994)]</a>
                     </span>
                  </b>, <b>
                     <span style="font-size:85%">
                        <a href="#wl2007" name="fromwl2007">[Lehmberg &amp; W&#246;rner (to appear)]</a>
                     </span>
                  </b>, <b>
                     <span style="font-size:85%">
                        <a href="#Wagner2005" name="fromWagner2005">[Wagner (2005)]</a>
                     </span>
                  </b> for examples). They usually come in two
        different flavours &#8211; traditionally and in accordance with the built-in
        XML data model, most corpus markup languages form hierarchies that are
        expressed by nested element trees (for example, for the representation
        of syntactic constituents or document structures that are in
        practically all cases modelled based on the OHCO paradigm, i.e., text
        is supposed to be an ordered hierarchy of content objects, see <b>
                     <span style="font-size:85%">
                        <a href="#Renear_93" name="fromRenear_93">[Renear et al. (1993)]</a>
                     </span>
                  </b>).</p>
               <p> In stark contrast to hierarchical data formats are markup
        languages that anchor a data set to a timeline (primarily used for the
        transcription of spoken language) &#8211; this approach borrows
        heavily from Bird and Liberman's Annotation Graphs <b>
                     <span style="font-size:85%">
                        <a href="#Bird_Liberman_01" name="fromBird_Liberman_01">[Bird &amp; Liberman (2001)]</a>
                     </span>
                  </b>. In timeline-based formats such as
        EXMARaLDA <b>
                     <span style="font-size:85%">
                        <a href="#Schmidt2001" name="fromSchmidt2001">[Schmidt (2001)]</a>
                     </span>
                  </b>, the annotator can draw an
        arc from one anchor to another point on the timeline. However, these
        upper-level structures are not represented by nested XML
        element-trees, but with the help of corresponding attribute-value
        pairs. At the same time and regardless of the hierarchical or
        timeline-based model, both approaches usually encode several
        annotation layers concurrently: a certain set of XML elements or
        attributes represents information on morphology, another set
        encapsulates syntactic information, while other elements encode
        linguistic data related to semantic or pragmatic structures. In our
        sustainability project we have to deal with both hierarchical and
        timeline-based corpora (examples can be found in <b>
                     <span style="font-size:85%">
                        <a href="#Woerner2006" name="fromWoerner2006">[W&#246;rner et al. (2006)]</a>
                     </span>
                  </b>) and we have to provide the means for enabling
        users to query both types of resources in a uniform way <b>
                     <span style="font-size:85%">
                        <a href="#Rehm2007" name="fromRehm2007">[Rehm et al. (2007)]</a>
                     </span>
                  </b>. In fact, the original annotation format will be
        irrelevant to the user, as the user interface and the underlying
        technology will abstract from any idiosyncrasies and peculiarities of
        the original corpus data formats. </p>
               <p> We devised an approach called GENAU (Generalised Architecture for
        Sustainability) that is able to cope with the abovementioned
        difficulties (<b>
                     <span style="font-size:85%">
                        <a href="#Dipper2006" name="fromDipper2006">[Dipper et al. (2006)]</a>
                     </span>
                  </b>, <b>
                     <span style="font-size:85%">
                        <a href="#Schmidt2006" name="fromSchmidt2006">[Schmidt et al. (2006)]</a>
                     </span>
                  </b>, <b>
                     <span style="font-size:85%">
                        <a href="#Woerner2006" name="fromWoerner2006">[W&#246;rner et al. (2006)]</a>
                     </span>
                  </b>) and that can
        be compared to the NITE Object Model <b>
                     <span style="font-size:85%">
                        <a href="#Carletta_et_al2003" name="fromCarletta_et_al2003">[Carletta et al. (2003)]</a>
                     </span>
                  </b>. The system architecture and our corpus
        processing workflow are depicted in figure <a href="#fig_workflow">1</a>.  First, a corpus to be imported into our web-based
        corpus platform has to be analysed manually. Depending on the XML
        markup used in annotating the respective corpus, the XML document
        instance is transformed into multi-rooted trees. Some corpora can be
        transformed using simple XSLT stylesheets, while other corpora have to
        be processed using a custom set of tools (Tool<sub>1</sub>,
        Tool<sub>2</sub> etc. in figure <a href="#fig_workflow">1</a>) with
        regard to this initial processing stage. </p>
               <p>Corpora annotated based on the hierarchical model are analysed by
        a tool that enables us to map XML elements, attributes and textual
        content onto one or more annotation as well as primary or secondary
        data layers (see section <a href="#SingleRootedTrees">3</a>). As soon as this mapping exists, the
        <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">
                        <mml:ci encoding="" definitionURL="">n</mml:ci>
                     </mml:math>
                  </span>
	annotation layers identified 
        in the analysis can be exported as
        <span style="font-family: 'Lucida Sans Unicode'">
                     <mml:math overflow="scroll">
                        <mml:ci encoding="" definitionURL="">n</mml:ci>
                     </mml:math>
                  </span> XML document
	instances. In 
        other words, this tool semi-automatically splits hierarchically
        annotated corpora that typically consist of a single XML document
        instance, into individual XML files, so that each file represents all
        the information related to a single linguistic annotation
        layer. Furthermore, this approach guarantees that overlapping
        structures &#8211; a notorious problem using single XML documents and,
        hence, single element trees &#8211; can be represented in a
        straightforward way <b>
                     <span style="font-size:85%">
                        <a href="#Witt04" name="fromWitt04">[Witt (2004)]</a>
                     </span>
                  </b>. </p>
               <p>Timeline-based corpora (for example, EXMARaLDA corpora) are split
        using another tool in order to separate the graph annotations that are
        also stored in individual XML files (see section <a href="#TransformingAnnotationGraphs">4</a>). Our approach
        enables us to represent arbitrary types of XML-annotated corpora as
        individual files that can be conceptualised as individual XML element
        trees. In a way, these multi-rooted trees are represented as regular
        XML document instances, but, since a single corpus is comprised of
        <i>multiple</i> files, there is a need to
        go beyond the functionality offered by typical XML tools in order to
        enable us to process multiple files, as regular tools work with single
        files only. </p>
               <p> Finally, these single XML files are imported into a native XML
        database; currently we use the open source database eXist (see the
        right hand side in figure <a href="#fig_workflow">1</a>). A third tool
        anchors all files to a set of primary data in order to allow
        query-time coordination between the individual files that represent a
        single-rooted tree each (<b>
                     <span style="font-size:85%">
                        <a href="#Eckart2007" name="fromEckart2007">[Eckart &amp; Teich (2007)]</a>
                     </span>
                  </b>, <b>
                     <span style="font-size:85%">
                        <a href="#Rehm2007" name="fromRehm2007">[Rehm et al. (2007)]</a>
                     </span>
                  </b>).
	  </p>
               <div class="figure">
                  <a name="fig_workflow"/>
                  <h5>Figure 1: The two main phases of our corpus processing workflow</h5>
                  <img src="EML2007WITT070501.png" border="0" width="100%"/>
                  <h5>[Link to <a href="EML2007WITT070501.png" target="EML2007WITT070501.png">open this graphic in a separate page</a>]</h5>
               </div>
               <p> At the same time, the elements and attributes used in the markup
        languages are analysed and incorporated into an ontology, represented
        in OWL (Web Ontology Language), that encapsulates knowledge about
        linguistic terms and concepts. The ontology is used to generalise over
        the specific and, at times, idiosyncratic names and labels used in the
        corpus annotation markup languages and to provide a coherent, unified,
        and homogeneous perspective on the large set of heterogeneous corpora
        (see the left hand side in figure <a href="#fig_workflow">1</a> as
        well as <b>
                     <span style="font-size:85%">
                        <a href="#Chiarcos2007" name="fromChiarcos2007">[Chiarcos (2007)]</a>
                     </span>
                  </b>, <b>
                     <span style="font-size:85%">
                        <a href="#Rehm2007" name="fromRehm2007">[Rehm et al. (2007)]</a>
                     </span>
                  </b>).
        The OWL ontology is the main resource within our query interface:
        users can, for example, search for different combinations of
        part-of-speech tags. Usually, different corpora use different element
        and attribute names for encoding part-of-speech information, but the
        homogenising ontology of linguistic terms and concepts enables us to
        automatically expand a given query into all matching and appropriate
        element and attribute names. <b>
                     <span style="font-size:85%">
                        <a href="#Rehm2007" name="fromRehm2007">[Rehm et al. (2007)]</a>
                     </span>
                  </b> describe this
        approach in detail.
	  </p>
            </div>
            <div class="section">
               <h2>
                  <a name="SingleRootedTrees"/>Transforming Single Rooted Trees</h2>
               <p> The multiple annotation layers contained in a regular
      XML-annotated corpus may need to be separated in order to be transformed
      into other multi-hierarchical annotation formats such as XCONCUR (and to
      comply with the GENAU approach). We developed a pipeline called Leveler
      for such XML-document transformations. Its purpose is two-fold:
		<ol type="1">
                     <li>Moving certain PCDATA content to attributes (in order to
              separate PCDATA content that represents annotations from the
              actual primary data of the corpus which are, in practically all
              cases, also PCDATA).</li>
                     <li>Splitting the corpus into different files according to the
              different layers of annotation (e.g., syntactic, morphological,
              etc.).</li>
                  </ol> The actual transformations are carried out using XSLT and
		are directed by configuration files created in a web application
		(also) called <b>Leveler</b>.
	  </p>
               <div class="subsec1">
                  <h3>
                     <a name="levelerFlow"/>Leveler Pipeline</h3>
                  <p>As shown in figure <a href="#fig_leveler_pipeline">2</a> there are two main steps to the Leveler process, each
		corresponding to one of the goals stated above.  The original corpus
		(C<sub>0</sub>) is used as input into Leveler to generate the Step 1
		configuration file which contains directives on converting annotation
		information stored as PCDATA to attribute values. An XSL
		transformation then takes C<sub>0</sub> and the configuration file as
		input to generate a text-transformed corpus file, C<sub>1</sub>. At
		this point, if one extracts the PCDATA from C<sub>1</sub>, one should
		just get the straight text of the corpus without annotation
		information. </p>
                  <p> C<sub>1</sub> is itself used as input to
		another part of the Leveler tool to generate the Step 2 configuration
		file which specifies the different layers of annotation found in the
		corpus as well as which XML elements belong to them. Another XML
		transformation takes this configuration file and C<sub>1</sub> to
		generate C<sub>2</sub>, a layer-annotated, text-transformed
		corpus. Layer information is simply added to the annotation elements
		as an attribute, and as such, the straight text of the corpus is
		preserved. </p>
                  <p>C<sub>2</sub> can then be fed into a final
		XSL transformation which creates an XML document for each layer, each
		containing only those nodes which belong to that layer. Each of these
		files will contain the same PCDATA information, although the branching
		of the element tree structure may be different. Leveler uses its own dedicated 
		namespace for automatically-inserted elements.
		</p>
                  <div class="figure">
                     <a name="fig_leveler_pipeline"/>
                     <h5>Figure 2: The Leveler pipeline</h5>
                     <img src="EML2007WITT070502.png" border="0"/>
                     <h5>[Link to <a href="EML2007WITT070502.png" target="EML2007WITT070502.png">open this graphic in a separate page</a>]</h5>
                  </div>
               </div>
               <div class="subsec1">
                  <h3>
                     <a name="levelerLeveler"/>The Leveler Web Application</h3>
                  <p> The bulk of the work is done with a web-based tool, Leveler,
          which takes user input in combination with an XML document to
          generate configuration files for use in the XSL
          transformations. Leveler is written in PHP with the SimpleXML
          extension and presents the user with a series of web forms. 
		</p>
                  <div class="figure">
                     <a name="fig_leveler_main"/>
                     <h5>Figure 3: Leveler main screen</h5>
                     <img src="EML2007WITT070503.png" border="0" width="100%"/>
                     <h5>[Link to <a href="EML2007WITT070503.png" target="EML2007WITT070503.png">open this graphic in a separate page</a>]</h5>
                  </div>
                  <p>Figure <a href="#fig_leveler_main">3</a> shows the main Leveler
          web form. The first fieldset, "Basic Information", is used for both
          Step 1 and Step 2. Here the user specifies the XML document filename
          and can optionally add Comment, Operator, and Project metadata to
          the resulting configuration file. To advance to Step 1, the user
          clicks the "Submit Step 1" button, uploading the XML file for
          server-side processing.
		</p>
                  <div class="subsec2">
                     <h4>
                        <a name="levelerStep1"/>Leveler: Step 1</h4>
                     <p>In the first step (figure <a href="#fig_leveler_step1">4</a>),
		  Leveler parses C<sub>0</sub> and returns a table that contains a row
		  for each element in the XML document that has textual information
		  (i.e., contains PCDATA); a list of elements that do not contain text
		  nodes is listed at the bottom of the page for reference. For each
		  element, the user selects one of three transformations by clicking
		  on a radio button (the &#8704; link at the top of each column is a
		  shortcut that sets all elements to that transformation type). Upon
		  submission of this form, Leveler generates an XML configuration file
		  which contains a list of elements and how PCDATA within them should
		  be handled.</p>
                     <div class="figure">
                        <a name="fig_leveler_step1"/>
                        <h5>Figure 4: Step 1 Leveler web form</h5>
                        <img src="EML2007WITT070504.png" border="0" width="100%"/>
                        <h5>[Link to <a href="EML2007WITT070504.png" target="EML2007WITT070504.png">open this graphic in a separate page</a>]</h5>
                     </div>
                     <p> The three text transformations are detailed in the following
            table with examples that show the transformation as applied to the
            following corpus XML fragment which contains words in
            <tt class="code">orth</tt> elements and their parts of speech in
            <tt class="code">pos</tt> elements:</p>
                     <div class="figure">
                        <a name="figXMLFrag"/>
                        <h5>Figure 5: Sample XML fragment</h5>
                        <p>
              

                           <div class="codeblock">
                              <pre>&lt;tok id="s119n0"&gt;
    &lt;orth&gt;Wir&lt;/orth&gt;
    &lt;pos func="HD"&gt;PPER&lt;/pos&gt;
&lt;/tok&gt;
&lt;tok id="s119n1"&gt;
    &lt;orth&gt;m&#252;ssen&lt;/orth&gt;
    &lt;pos func="HD"&gt;VMFIN&lt;/pos&gt;
&lt;/tok&gt;
&lt;tok id="s119n2"&gt;
    &lt;orth&gt;uns&lt;/orth&gt;
    &lt;pos func="HD"&gt;PRF&lt;/pos&gt;
&lt;/tok&gt;
&lt;tok id="s119n3"&gt;
    &lt;orth&gt;selbst&lt;/orth&gt;
    &lt;pos func="HD"&gt;ADV&lt;/pos&gt;
&lt;/tok&gt;
&lt;tok id="s119n4"&gt;
    &lt;orth&gt;helfen&lt;/orth&gt;
    &lt;pos func="HD"&gt;VVINF&lt;/pos&gt;
&lt;/tok&gt;
&lt;punc&gt;.&lt;/punc&gt;</pre>
                           </div>
            
                        </p>
                     </div>
                     <h4>Types of Text Transformations (Step 1)</h4>
                     <table border="0" cellpadding="8" class="deflist">
                        <tr>
                           <td valign="top">
                
                              <b>Real PCDATA</b>
              
                           </td>
                           <td valign="top">
                              <p class="first">Identity transformation - all PCDATA nodes remain
                  PCDATA. Note that in the example output the part of speech
                  tags remain PCDATA nodes; extraction of text nodes results
                  in the intermingling of POS labels: "Wir PPER m&#252;ssen VMFIN
                  uns PRF selbst ADV helfen VVINF ."</p>
                              <p>
                  

                                 <div class="codeblock">
                                    <pre>&lt;tok id="s119n0"&gt;
    &lt;orth&gt;Wir&lt;/orth&gt;
    &lt;pos func="HD"&gt;PPER&lt;/pos&gt;
&lt;/tok&gt;
&lt;tok id="s119n1"&gt;
    &lt;orth&gt;m&#252;ssen&lt;/orth&gt;
    &lt;pos func="HD"&gt;VMFIN&lt;/pos&gt;
&lt;/tok&gt;
&lt;tok id="s119n2"&gt;
    &lt;orth&gt;uns&lt;/orth&gt;
    &lt;pos func="HD"&gt;PRF&lt;/pos&gt;
&lt;/tok&gt;
&lt;tok id="s119n3"&gt;
    &lt;orth&gt;selbst&lt;/orth&gt;
    &lt;pos func="HD"&gt;ADV&lt;/pos&gt;
&lt;/tok&gt;
&lt;tok id="s119n4"&gt;
    &lt;orth&gt;helfen&lt;/orth&gt;
    &lt;pos func="HD"&gt;VVINF&lt;/pos&gt;
&lt;/tok&gt;
&lt;punc&gt;.&lt;/punc&gt;</pre>
                                 </div>
                
                              </p>
                           </td>
                        </tr>
                        <tr>
                           <td valign="top">
                
                              <b>Annotation</b>
              
                           </td>
                           <td valign="top">
                              <p class="first">Text data is converted to an introduced attribute
                  <tt class="code">text</tt> (in the Leveler namespace) of the containing element. </p>
                              <p>Example:
                  <tt class="code">pos</tt> set to "Annotation", the rest set to "Real
                  PCDATA".  This results in a fragment where the PCDATA is the
                  pure corpus text: "Wir m&#252;ssen uns selbst helfen .".</p>
                              <p>
                  

                                 <div class="codeblock">
                                    <pre>&lt;tok id="s119n0"&gt;
    &lt;orth&gt;Wir&lt;/orth&gt;
    &lt;pos func="HD" leveler:text="PPER" /&gt;
&lt;/tok&gt;
&lt;tok id="s119n1"&gt;
    &lt;orth&gt;m&#252;ssen&lt;/orth&gt;
    &lt;pos func="HD" leveler:text="VMFIN" /&gt;
&lt;/tok&gt;
&lt;tok id="s119n2"&gt;
    &lt;orth&gt;uns&lt;/orth&gt;
    &lt;pos func="HD" leveler:text="PRF" /&gt;
&lt;/tok&gt;
&lt;tok id="s119n3"&gt;
    &lt;orth&gt;selbst&lt;/orth&gt;
    &lt;pos func="HD" leveler:text="ADV" /&gt;
&lt;/tok&gt;
&lt;tok id="s119n4"&gt;
    &lt;orth&gt;helfen&lt;/orth&gt;
    &lt;pos func="HD" leveler:text="VVINF" /&gt;
&lt;/tok&gt;
&lt;punc&gt;.&lt;/punc&gt;</pre>
                                 </div>
                
                              </p>
                           </td>
                        </tr>
                        <tr>
                           <td valign="top">
                
                              <b>Mixed</b>
              
                           </td>
                           <td valign="top">
                              <p class="first">Text data remains as PCDATA of the element but is also duplicated as an 
                  introduced attribute <tt class="code">text</tt> (in the Leveler namespace). </p>
                              <p> Example:
                  <tt class="code">pos</tt> set to "Annotation", <tt class="code">punc</tt> set
                  to "Mixed", the rest set to "Real PCDATA". This also results
                  in a fragment where the PCDATA is the pure corpus text: "Wir
                  m&#252;ssen uns selbst helfen .".</p>
                              <p>
                  

                                 <div class="codeblock">
                                    <pre>&lt;tok id="s119n0"&gt;
    &lt;orth&gt;Wir&lt;/orth&gt;
    &lt;pos func="HD" leveler:text="PPER" /&gt;
&lt;/tok&gt;
&lt;tok id="s119n1"&gt;
    &lt;orth&gt;m&#252;ssen&lt;/orth&gt;
    &lt;pos func="HD" leveler:text="VMFIN" /&gt;
&lt;/tok&gt;
&lt;tok id="s119n2"&gt;
    &lt;orth&gt;uns&lt;/orth&gt;
    &lt;pos func="HD" leveler:text="PRF" /&gt;
&lt;/tok&gt;
&lt;tok id="s119n3"&gt;
    &lt;orth&gt;selbst&lt;/orth&gt;
    &lt;pos func="HD" leveler:text="ADV" /&gt;
&lt;/tok&gt;
&lt;tok id="s119n4"&gt;
    &lt;orth&gt;helfen&lt;/orth&gt;
    &lt;pos func="HD" leveler:text="VVINF" /&gt;
&lt;/tok&gt;
&lt;punc leveler:text="."&gt;.&lt;/punc&gt;</pre>
                                 </div>
                
                              </p>
                           </td>
                        </tr>
                     </table>
                  </div>
                  <div class="subsec2">
                     <h4>
                        <a name="levelerStep2"/>Leveler: Step 2</h4>
                     <p>In the second step, users enter a list of up to 8 annotation
          layers (e.g., "morph", "sem", "syn") that are found in the corpus
          before uploading the XML document (C<sub>1</sub>) for analysis
          (figure <a href="#fig_leveler_main">3</a>). After the analysis of
          the document the user is again presented with a table where each row
          is a (unique) element as in figure <a href="#fig_leveler_step2">6</a>. For each element the user can then
          specify which annotation layer(s) it belongs to (with shortcuts to
          assign all elements to or remove all elements from a particular
          layer). Like Step 1, submission of this form results in an XML
          configuration file which contains a list of layer names and a list
          of element names with layer membership information (IDREF to a layer
          ID).</p>
                     <div class="figure">
                        <a name="fig_leveler_step2"/>
                        <h5>Figure 6: Step 2 Leveler web form</h5>
                        <img src="EML2007WITT070505.png" border="0" width="100%"/>
                        <h5>[Link to <a href="EML2007WITT070505.png" target="EML2007WITT070505.png">open this graphic in a separate page</a>]</h5>
                     </div>
                     <p>Because of the number of files and steps involved, a Java
			utility called <b>LevelRunner</b>
			provides a simple graphical user interface to running the XSL
			transformations. Users select which step they are on, then use
			file selectors to choose a source XML file, a configuration file
			(i.e., output from the web application), and an output
			directory. Clicking on a "Run" button shells out to Saxon to run
			the appropriate transformation. (Note that the actual command used
			is configurable so users are not locked into a particular
			processor.)</p>
                  </div>
               </div>
               <div class="subsec1">
                  <h3>
                     <a name="levelerCaveats"/>Considerations and Error Detection</h3>
                  <p>The web application should be run on a local network (or on the
		  local machine) due to the large amount of data that is usually
		  transferred while uploading a corpus (which may be several tens, if
		  not hundreds, of megabytes) to the web server for parsing by
		  SimpleXML/PHP (which itself must be configured to handle very large
		  HTTP Post requests). This design decision was taken because users
		  could not be guaranteed to have a specific browser (e.g., one that
		  supports XML loading and parsing), although a future switch to
		  client-side XML document analysis is not out of the
		  question. Performance so far has not been an issue, even with
		  corpora several hundred megabytes large.</p>
                  <p> It could be the case that in Step 2, users do not assign the
		  element(s) containing the text of the corpus (as PCDATA) to one or
		  possibly all layers, meaning the text of the corpus is not the same
		  from one layer to the next. This would cause standoff annotation
		  methods such as XCONCUR to fail since segments will be missing and
		  location information would differ for each layer. This could be
		  solved by using the <tt class="code">normalize</tt> program<sup>
                        <span class="highlight">
                           <a href="#tod0e385" name="fromd0e385">2</a>
                        </span>
                     </sup> that tries to normalize whitespace in
          one or more XML files and throws an error if normalisation is
          impossible (e.g., missing PCDATA nodes).</p>
               </div>
            </div>
            <div class="section">
               <h2>
                  <a name="TransformingAnnotationGraphs"/>Transforming Annotation Graphs</h2>
               <p>Annotation Graphs (AGs, <b>
                     <span style="font-size:85%">
                        <a href="#Bird_Liberman_01" name="fromBird_Liberman_01">[Bird &amp; Liberman (2001)]</a>
                     </span>
                  </b>) are
        used for the representation of multi-layered linguistic
        annotations. They allow for the declaration of an unlimited number of
        annotation layers and for an unrestricted linking of points in the
        text and in the annotation. The formal framework of Annotation Graphs
        is instantiated in different annotation tools and annotation models. A
        prominent application of AGs is EXMARaLDA. <b>
                     <span style="font-size:85%">
                        <a href="#Schmidt2001" name="fromSchmidt2001">[Schmidt (2001)]</a>
                     </span>
                  </b>. This section describes the transformation of
        EXMARaLDA into GENAU.</p>
               <p>We developed Splitter, an XSLT-based tool that operates on
        documents in the time-based EXMARaLDA basic-transcription format and
        that converts them into a hierarchy-based format called <i>split-transcription</i>. The XML-based EXMARaLDA
        <i>basic-transcription</i> format uses a
        time-based data model, the <i>"single timeline, 
        multiple tiers"</i> (STMT) data
        model (<b>
                     <span style="font-size:85%">
                        <a href="#Schmidt2005" name="fromSchmidt2005">[Schmidt (2005)]</a>
                     </span>
                  </b>). Information about speakers'
        verbal and nonverbal actions, other events, and various annotations
        referring to these actions and events is contained in several
        <i>tiers</i>.  Each tier is identified by
        two attributes: The <i>category</i> of
        data it contains &#8211; orthographic transcription of verbal actions
        vs. phonetic transcription of verbal actions vs. description of
        nonverbal actions vs.  annotation of manner of articulation of
        actions, and so on &#8211; and the <i>speaker</i> whose actions this particular tier
        contains; tiers of <i>speakerless</i>
        categories &#8211; e.g. for description of noises heard in the
        background &#8211; lack the speaker attribute, of course. Thus,
        actions of any two different speakers participating in the discourse
        are physically removed from each other in an EXMARaLDA file by being
        stored on different tiers, even if their category is the same. Also,
        actions of the same speaker but of different categories &#8211;
        e.g. Mary saying "I do not know" and shrugging at the same time
        &#8211; don't appear proximate in an EXMARaLDA file.  Not even, e.g.,
        a phonetic transcription appears in the vicinity of its orthographic
        counterpart. Rather, the temporal relation between events as well as
        the relations between annotations and physical events are expressed
        using a <i>common timeline</i>, to which
        all events from all tiers refer. Consider the following example
        EXMARaLDA <i>basic-transcription</i>
        (contents taken from <b>
                     <span style="font-size:85%">
                        <a href="#Schmidt2005" name="fromSchmidt2005">[Schmidt (2005)]</a>
                     </span>
                  </b>):
        
<div class="codeblock">
                     <pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;basic-transcription&gt;
  &lt;head&gt;
    &lt;meta-information&gt;
      &lt;!-- ... --&gt;
    &lt;/meta-information&gt;
    &lt;speakertable&gt;
      &lt;speaker id="DS"/&gt;
      &lt;speaker id="FB"/&gt;
    &lt;/speakertable&gt;
  &lt;/head&gt;
  &lt;basic-body&gt;
    &lt;common-timeline&gt;
      &lt;tli id="T0" time="0.0"/&gt;
      &lt;tli id="T1"/&gt;
      &lt;tli id="T2"/&gt;
      &lt;tli id="T3"/&gt;
      &lt;tli id="T4"/&gt;
      &lt;tli id="T5"/&gt;
    &lt;/common-timeline&gt;
    &lt;tier id="TIE0" speaker="DS" category="sup" type="a"&gt;
      &lt;event start="T1" end="T3"&gt;faster&lt;/event&gt;
    &lt;/tier&gt;
    &lt;tier id="TIE1" speaker="DS" category="v" type="t"&gt;
      &lt;event start="T0" end="T1"&gt;Okay.&lt;/event&gt;
      &lt;event start="T1" end="T2"&gt;Tr&#232;s bien,&lt;/event&gt;
      &lt;event start="T2" end="T3"&gt;tr&#232;s bien.&lt;/event&gt;
    &lt;/tier&gt;
    &lt;tier id="TIE2" speaker="DS" category="en" type="a"&gt;
      &lt;event start="T0" end="T1"&gt;Okay.&lt;/event&gt;
      &lt;event start="T1" end="T3"&gt;Very good, very good.&lt;/event&gt;
    &lt;/tier&gt;
    &lt;tier id="TIE3" speaker="DS" category="nv" type="d"&gt;
      &lt;event start="T2" end="T4"&gt;right hand raised&lt;/event&gt;
    &lt;/tier&gt;
    &lt;tier id="TIE4" speaker="FB" category="v" type="t"&gt;
      &lt;event start="T2" end="T3"&gt;Alors &#231;a&lt;/event&gt;
      &lt;event start="T3" end="T4"&gt;d&#233;pend ((cough))&lt;/event&gt;
      &lt;event start="T4" end="T5"&gt;un petit peu.&lt;/event&gt;
    &lt;/tier&gt;
    &lt;tier id="TIE5" speaker="FB" category="en" type="a"&gt;
      &lt;event start="T3" end="T5"&gt;That depends, then, a little bit.&lt;/event&gt;
    &lt;/tier&gt;
    &lt;tier id="TIE6" speaker="FB" category="pho" type="a"&gt;
      &lt;event start="T4" end="T5"&gt;[&#603;&#771;tip&#248;:]&lt;/event&gt;
    &lt;/tier&gt;
  &lt;/basic-body&gt;
&lt;/basic-transcription&gt;</pre>
                  </div> 

	As can be seen in this example, events are connected indirectly via
	<i>time line items</i> (TLIs) on the common
	timeline. That, for example, the first event, "faster", refers to the
	events "Tr&#232;s bien" and "tr&#232;s bien" can be inferred from the common start
	and end attributes, together with all three events being on tiers with
	speaker DS. Likewise, that FB starts talking ("Alors &#231;a") between DS's
	"tr&#232;s bien" utterances, is represented by the TLI T2, to which utterances
	of both speakers are anchored. These temporal relations allow for a
	two-dimensional visualisation of the transcription, such as the "musical
	score" notation that stacks tiers vertically and aligns events with
	respect to their start and end points in time:</p>
               <div class="figure">
                  <a name="fig_exmaralda_transcript"/>
                  <h5>Figure 7: Transcript in the "musical score" notation</h5>
                  <img src="EML2007WITT070506.png" border="0"/>
                  <h5>[Link to <a href="EML2007WITT070506.png" target="EML2007WITT070506.png">open this graphic in a separate page</a>]</h5>
               </div>
               <p>The purpose of Splitter is to transform this time-based format
        into a hierarchy-based format, where the relation between an event and
        the annotation describing it is not principally one of identical
        position on a timeline but a dominance relation in an ordered
        hierarchy. Since in the basic transcription, everything from "tape
        crackles" to the phonetic transcription of "un petit peu"
        ([&#603;&#771;tip&#248;:]) is expressed as <tt class="code">event</tt> elements, a
        distinction between "events" in the narrow sense and "annotations"
        describing them has to be introduced first. The user specifies a
        parameter called <tt class="code">skeletonCategory</tt>. The events on tiers of
        this category are to become the leaves of a <i>multi-rooted tree</i> Splitter produces. Here one
        would probably choose <tt class="code">v</tt>, the orthographic category. They
        will be treated as "events" in the narrow sense, everything else being
        "annotation". </p>
               <p> In the initial step, Splitter takes all the events from all the
        tiers of the skeleton category and puts them into a single,
        chronologically-ordered sequence. The physical separation of
        utterances of different speakers is thereby removed. To keep speakers
        identifiable, the respective speaker ID is added as an attribute to
        each event. Temporally adjacent events belonging to the same speaker
        are treated as a coherent utterance and kept adjacent in the
        transcription, even if there are events belonging to other speakers
        during the utterance. As opposed to the <i>score</i>-like visualisation, the XML
        representation now resembles a <i>script</i> and can intuitively be read line by line:

        
<div class="codeblock">
                     <pre>
&lt;event id="1" speaker="DS" start="T0" end="T1"&gt;Okay.&lt;/event&gt; 
&lt;event id="2" speaker="DS" start="T1" end="T2"&gt;Tr&#232;s bien,&lt;/event&gt;
&lt;event id="3" speaker="DS" start="T2" end="T3"&gt;tr&#232;s bien.&lt;/event&gt; 
&lt;event id="4" trans="sync" ref="3" speaker="FB" start="T2" end="T3"&gt;Alors &#231;a&lt;/event&gt;
&lt;event id="5" speaker="FB" start="T3" end="T4"&gt;d&#233;pend ((cough))&lt;/event&gt;
&lt;event id="6" speaker="FB" start="T4" end="T5"&gt;un petit peu.&lt;/event&gt;</pre>
                  </div> 

		A script is less apt than a score to represent verbal actions taking
        place simultaneously. To facilitate the spotting of overlaps, Splitter
        annotates them explicitly. For example, event 4 comes after event 3 in
        the sequence of XML elements but occupies exactly the same position on
        the timeline, so there is an overlap. This is indicated by giving the
        "overlapping" event (4) two additional attributes, <tt class="code">ref</tt>
        containing the ID of the "overlapped" element (3) and
        <tt class="code">trans</tt> indicating the type of overlap. The following
        values are possible for the <tt class="code">trans</tt> attribute of an event Y
        overlapping an event X:

		<ul>
                     <li>
                        <b>overlap:</b> Classic
              overlap: Y starts during X and continues beyond the end of
              X.</li>
                     <li>
                        <b>sync:</b> X and Y share
              their start points and their end points (see example).</li>
                     <li>
                        <b>sync-shorter:</b> X and Y
              start at the same time, but Y ends before X.</li>
                     <li>
                        <b>sync-longer:</b> X and Y
              start at the same time, but Y ends after X.</li>
                     <li>
                        <b>within:</b> All Y takes
              place during X. X starts earlier and ends later or at the same
              time.</li>
                  </ul>
        
        The "skeleton" sequence of events is the common ground for all that is
        to come &#8211; it will never be filtered, reordered, or changed in
        content any more, only marked up with elements annotating the
        events. In our case, we have annotation for suprasegmental features
        (<tt class="code">sup</tt>), an English translation (<tt class="code">en</tt>),
        nonverbal actions (<tt class="code">nv</tt>), and (sporadic) phonetic
        transcription. The annotation is to be expressed as additional XML
        elements <i>containing</i> the events
        they describe. For example, to add markup for supersegmental features,
        Splitter adds <tt class="code">sup</tt> elements as follows:

        
<div class="codeblock">
                     <pre>&lt;sup&gt; &lt;event id="1" speaker="DS" start="T0"
        end="T1"&gt;Okay.&lt;/event&gt;
&lt;/sup&gt;
&lt;sup value="faster" start="T1" end="T3"&gt;
  &lt;event id="2" speaker="DS" start="T1" end="T2"&gt;Tr&#232;s bien,&lt;/event&gt;
  &lt;event id="3" speaker="DS" start="T2" end="T3"&gt;tr&#232;s bien.&lt;/event&gt;
&lt;/sup&gt;
&lt;sup&gt;
  &lt;event id="4" trans="sync" ref="3" speaker="FB" start="T2" end="T3"&gt;Alors &#231;a&lt;/event&gt;
  &lt;event id="5" speaker="FB" start="T3" end="T4"&gt;d&#233;pend ((cough))&lt;/event&gt;
  &lt;event id="6" speaker="FB" start="T4" end="T5"&gt;un petit peu.&lt;/event&gt;
&lt;/sup&gt;</pre>
                  </div> 

		The name of the tier category, <tt class="code">sup</tt>, now serves as a name
        for the annotating elements. Each annotating element corresponds to
        either an event on a tier of the respective category in the source
        document (in this case, "faster" annotating events 2&#8211;3) or a
        chunk of unannotated "skeleton" events (in this case, 1 and
        4&#8211;6). </p>
               <p>Simply continuing like this and adding
        elements of other categories would go awry in the general case,
        because annotating events in the source document can overlap freely,
        whereas XML elements cannot overlap &#8211; a strict tree-structure is
        required. Splitter's solution is to create multiple result
        documents. Each result document contains annotating elements of just
        one category &#8211; hence the name "Splitter" &#8211; but all are
        completely identical with respect to the sequence of contained
        events. The set of the result documents can thus be interpreted as a
        multi-rooted tree, with different annotating nodes descending from
        different roots, but ultimately dominating a common sequence of
        leaves. </p>
               <p> The result documents (in this case en.xml, nv.xml, pho.xml, sup.xml, and v.xml) are completed by one <i>split-transcription</i> meta-document,
        essentially the same as the source document, except the tiers are
        replaced with references to the respective documents:

        
<div class="codeblock">
                     <pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;split-transcription xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  xmlns:f="http://www.sfb441.uni-tuebingen.de/c2/Kilian/XSLT/Funktionen"&gt;
  &lt;head&gt;
    &lt;meta-information&gt;&lt;!-- ... --&gt;&lt;/meta-information&gt;
    &lt;speakertable&gt;
      &lt;speaker id="DS"&gt;
        &lt;abbreviation&gt;DS&lt;/abbreviation&gt;
      &lt;/speaker&gt;
      &lt;speaker id="FB"&gt;
        &lt;abbreviation&gt;FB&lt;/abbreviation&gt;
      &lt;/speaker&gt;
    &lt;/speakertable&gt;
  &lt;/head&gt;
  &lt;split-body&gt;
    &lt;common-timeline&gt;
      &lt;tli id="T0" time="0.0"/&gt;
      &lt;tli id="T1"/&gt;
      &lt;tli id="T2"/&gt;
      &lt;tli id="T3"/&gt;
      &lt;tli id="T4"/&gt;
      &lt;tli id="T5"/&gt;
    &lt;/common-timeline&gt;
    &lt;tier id="sup" category="sup" type="a" href="./split/example/sup.xml"/&gt;
    &lt;tier id="nv" category="nv" type="d" href="./split/example/nv.xml"/&gt;
    &lt;tier id="v" category="v" type="t" href="./split/example/v.xml"/&gt;
    &lt;tier id="en" category="en" type="a" href="./split/example/en.xml"/&gt;
    &lt;tier id="pho" category="pho" type="a" href="./split/example/pho.xml"/&gt;
  &lt;/split-body&gt;
&lt;/split-transcription&gt;
        </pre>
                  </div>
      
               </p>
            </div>
            <div class="section">
               <h2>
                  <a name="sec-xconcur"/>XCONCUR</h2>
               <p>SGML (see <b>
                     <span style="font-size:85%">
                        <a href="#Sgml1986" name="fromSgml1986">[SGML (1986)]</a>
                     </span>
                  </b>, <b>
                     <span style="font-size:85%">
                        <a href="#Goldfarb1990" name="fromGoldfarb1990">[Goldfarb (1990)]</a>
                     </span>
                  </b>) gave authors a facility to create documents
      with multiple, possibly overlapping, hierarchies in a rather easy way by
      its <tt class="code">CONCUR</tt> option. The SGML specification has never been
      implemented completely, and, as a consequence, there is not a single
      SGML system we are aware of with full support for
      <tt class="code">CONCUR</tt>. XCONCUR (introduced by <b>
                     <span style="font-size:85%">
                        <a href="#Hilbert2005" name="fromHilbert2005">[Hilbert et al. (2005)]</a>
                     </span>
                  </b>) provides authors who are familiar with XML a
      standardized and intuitive way to write documents with overlapping
      markup. This is achieved by reviving SGML's <tt class="code">CONCUR</tt> option
      and by introducing this option into XML.</p>
               <p>A set of XML documents with identical primary data<sup>
                     <span class="highlight">
                        <a href="#tod0e573" name="fromd0e573">3</a>
                     </span>
                  </sup> can be
      transformed into XCONCUR, but, unfortunately, there is currently no tool
      to achieve this in a straightforward way. If, however, the set of XML
      documents is converted to a Prolog clause database, as defined within
      the architecture of the Sekimo project (see <b>
                     <span style="font-size:85%">
                        <a href="#Witt2005" name="fromWitt2005">[Witt et al. (2005)]</a>
                     </span>
                  </b>), this clause database can be converted to XCONCUR
      using <tt class="code">prolog2xconcur</tt>.</p>
               <div class="subsec1">
                  <h3>
                     <a name="sec-syntax"/>Document Syntax</h3>
                  <p>XCONCUR's document syntax is very similar to SGML with the
		  <tt class="code">CONCUR</tt> option set to <tt class="code">YES</tt>. Each element is
		  prefixed with an annotation layer id. This annotation layer id is
		  used to assign the specific element to a distinct annotation
		  layer. A <i>generic identifier</i> in
		  terms of XCONCUR is the combination of annotation layer id and
		  element name. It has the form
		  <tt class="code">(&lt;layer-id&gt;)&lt;name&gt;</tt>, where
		  <tt class="code">&lt;layer-id&gt;</tt> is the annotation layer id and
		  <tt class="code">&lt;name&gt;</tt> the element name. The annotation layer id
		  must conform to the XML Namespaces (see <b>
                        <span style="font-size:85%">
                           <a href="#XmlNamespaces2006" name="fromXmlNamespaces2006">[Bray at el. (2006b)]</a>
                        </span>
                     </b>) rules for NCName while the element
		  name must conform to the rules for QName.</p>
                  <p>Generally, XCONCUR is subject to the same restrictions XML (see
		  <b>
                        <span style="font-size:85%">
                           <a href="#Xml2006" name="fromXml2006">[Bray et al. (2006a)]</a>
                        </span>
                     </b>) imposes on SGML (e.g. elements must be
		  closed, no tag minimization, etc). In contrast to SGML, no elements
		  without an annotation layer id are allowed. Likewise, even if
		  elements with the same element name occur on different annotation
		  layers, each element has to be specified explicitly on its
		  annotation layer by means of the annotation layer id.</p>
                  <p>Similar to XML, each XCONCUR document is required to be
		  well-formed. This well-formedness is defined in terms of XML
		  well-formedness. Each well-formed XCONCUR document can be projected
		  (or decomposed) to a set of well-formed XML documents as follows:
		  select an annotation layer and remove all annotations (tags) which
		  do not belong to the selected layer. Then, remove all annotation
		  layer ids, including the parentheses, and delete any XCONCUR
		  processing instruction(s); repeat this procedure for the remaining
		  annotation layers. An XCONCUR document is well-formed if all layers
		  are well-formed in terms of XML.</p>
                  <p>XCONCUR currently defines two processing instructions. The
		  schema processing instruction is a replacement for the deprecated
		  DOCTYPE declaration<sup>
                        <span class="highlight">
                           <a href="#tod0e615" name="fromd0e615">4</a>
                        </span>
                     </sup>,
		  which allows an author to assign an annotation schema to a specific
		  annotation layer. The annotation schema may be written in any common
		  schema language like DTD, XML Schema or RelaxNG. Furthermore, one or
		  more constraint processing instructions can be used to associate
		  constraint sets (see section <a href="#sec-xconcur-cl">5-2</a>) to the
		  XCONCUR document. If an annotation schema is assigned to a layer,
		  that layer can be checked for validity. However, an XCONUR document
		  can only be considered valid if all layers are valid with regard to
		  their annotation schema and if there are no violations to the
		  constraint set.</p>
                  <p>The following example shows an excerpt of the Uppsala corpus of
		  Russian documents (see <b>
                        <span style="font-size:85%">
                           <a href="#Loenngren1993" name="fromLoenngren1993">[L&#246;nngren et al. (1993)]</a>
                        </span>
                     </b>) in a Latin
		  transliteration as an XCONCUR-annotated document. The corpus, which
		  is annotated using the Tusnelda standard, was preprocessed using the
		  Leveler application and split into multiple XML files. These files
		  were combined into a single XCONCUR document. For the sake of
		  readability only two of the four layers are displayed in figure
		  <a href="#fig-xconcur-example">8</a>. Furthermore only one headline
		  is shown and all of the coprus metadata has been omitted. The first
		  layer (with the annotation layer id <tt class="code">l1</tt>) encodes
		  morphology while the second layer (with the annotation layer id
		  <tt class="code">l2</tt>) captures the document and sentence structure
		  (headlines, paragraphs, and sentences). The smallest unit in the
		  annotation is a word which is annotated in <tt class="code">tok</tt>
		  elements. On both layers the <tt class="code">tok</tt> elements contain
		  <tt class="code">orth</tt> elements, which encode orthography.<sup>
                        <span class="highlight">
                           <a href="#tod0e642" name="fromd0e642">5</a>
                        </span>
                     </sup>
                  </p>
                  <div class="figure">
                     <a name="fig-xconcur-example"/>
                     <h5>Figure 8: Excerpt of the Uppsala corpus as an XCONCUR document
		  (reformatted for readability)</h5>
                     <p>
	    

                        <div class="codeblock">
                           <pre>&lt;?xconcur version="1.1" encoding="utf-8"?&gt;
&lt;?xconcur-schema layer-id="l1" root="tusneldaCorpus" system="tusnelda.dtd"?&gt;
&lt;?xconcur-schema layer-id="l2" root="tusneldaCorpus" system="tusnelda.dtd"?&gt;
&lt;(l1)tusneldaCorpus version="1.0"&gt;&lt;(l2)tusneldaCorpus version="1.0"&gt;
  &lt;!-- metadata sections omitted, here just an excerpt of the body --&gt;
  &lt;(l1)body id="SGID0201"&gt;&lt;(l2)body id="SGID0201"&gt;
    &lt;(l1)div type="unspecified"&gt;&lt;(l2)div type="unspecified"&gt;
      &lt;(l2)head type="main"&gt;
        &lt;(l2)s id="SGID0201.1" n="1"&gt;
          &lt;(l1)tok id="SGID0201.1.1" n="1"&gt;
            &lt;(l2)tok id="SGID0201.1.1" n="1"&gt;
              &lt;(l1)orth&gt;
                &lt;(l2)orth&gt;Kakoj&lt;/(l2)orth&gt;
              &lt;/(l1)orth&gt;
              &lt;(l1)pos leveler:text="pronoun"/&gt;
              &lt;(l1)desc&gt;
                &lt;(l1)feature type="subpos" leveler:text="interrogative"/&gt;
                &lt;(l1)feature type="tag" leveler:text="pronomen_int_nom_sg_masc_adj"/&gt;
                &lt;(l1)feature type="syntactic type" leveler:text="adjectival"/&gt;
                &lt;(l1)lemma leveler:text="kakoj"/&gt;
                &lt;(l1)case leveler:text="nominative"/&gt;
                &lt;(l1)gender leveler:text="masculine"/&gt;
                &lt;(l1)number leveler:text="singular"/&gt;
              &lt;/(l1)desc&gt;
            &lt;/(l2)tok&gt;
          &lt;/(l1)tok&gt;
          &lt;(l1)tok id="SGID0201.1.2" n="2"&gt;
            &lt;(l2)tok id="SGID0201.1.2" n="2"&gt;
              &lt;(l1)orth&gt;
                &lt;(l2)orth&gt;socializm&lt;/(l2)orth&gt;
              &lt;/(l1)orth&gt;
              &lt;(l1)pos leveler:text="noun"/&gt;
              &lt;(l1)desc&gt;
                &lt;(l1)feature type="subpos" leveler:text="common"/&gt;
                &lt;(l1)feature type="tag" leveler:text="substantiv_masc_sg_nom_unb"/&gt;
                &lt;(l1)feature type="animacy" leveler:text="inanimate"/&gt;
                &lt;(l1)lemma leveler:text="socializm"/&gt;
                &lt;(l1)case leveler:text="nominative"/&gt;
                &lt;(l1)gender leveler:text="masculine"/&gt;
                &lt;(l1)number leveler:text="singular"/&gt;
              &lt;/(l1)desc&gt;
            &lt;/(l2)tok&gt;
          &lt;/(l1)tok&gt;
          &lt;(l1)tok id="SGID0201.1.3" n="3"&gt;
            &lt;(l2)tok id="SGID0201.1.3" n="3"&gt;
              &lt;(l1)orth&gt;
                &lt;(l2)orth&gt;narodu&lt;/(l2)orth&gt;
              &lt;/(l1)orth&gt;
              &lt;(l1)pos leveler:text="noun"/&gt;
              &lt;(l1)desc&gt;
                &lt;(l1)feature type="subpos" leveler:text="common"/&gt;
                &lt;(l1)feature type="tag" leveler:text="substantiv_masc_sg_gen_gen2_unb"/&gt;
                &lt;(l1)feature type="subcase" leveler:text="gen_2"/&gt;
                &lt;(l1)feature type="animacy" leveler:text="inanimate"/&gt;
                &lt;(l1)lemma leveler:text="narod"/&gt;
                &lt;(l1)case leveler:text="genitive"/&gt;
                &lt;(l1)gender leveler:text="masculine"/&gt;
                &lt;(l1)number leveler:text="singular"/&gt;
              &lt;/(l1)desc&gt;
            &lt;/(l2)tok&gt;
          &lt;/(l1)tok&gt;
          &lt;(l1)tok id="SGID0201.1.4" n="4"&gt;
            &lt;(l2)tok id="SGID0201.1.4" n="4"&gt;
              &lt;(l1)orth&gt;
                &lt;(l2)orth&gt;nuzhen&lt;/(l2)orth&gt;
              &lt;/(l1)orth&gt;
              &lt;(l1)pos leveler:text="adjective"/&gt;
              &lt;(l1)desc&gt;
                &lt;(l1)feature type="tag" leveler:text="adjektiv_masc_sg_kurzform"/&gt;
                &lt;(l1)feature type="Adjform" leveler:text="short form"/&gt;
                &lt;(l1)lemma leveler:text="nuzhnyj"/&gt;
                &lt;(l1)gender leveler:text="masculine"/&gt;
                &lt;(l1)number leveler:text="singular"/&gt;
                &lt;(l1)degree leveler:text="positive"/&gt;
              &lt;/(l1)desc&gt;
            &lt;/(l2)tok&gt;
          &lt;/(l1)tok&gt;
          &lt;(l1)tok id="SGID0201.1.5" n="5"&gt;
            &lt;(l2)tok id="SGID0201.1.5" n="5"&gt;
              &lt;(l1)orth&gt;
                &lt;(l2)orth&gt;?&lt;/(l2)orth&gt;
               &lt;/(l1)orth&gt;
              &lt;(l1)pos leveler:text="interpunction"/&gt;
              &lt;(l1)desc&gt;
                &lt;(l1)feature type="tag" leveler:text="satzzeichen_fragezeichen"/&gt;
                &lt;(l1)feature type="type" leveler:text="question"/&gt;
                &lt;(l1)lemma leveler:text="?"/&gt;
              &lt;/(l1)desc&gt;
            &lt;/(l2)tok&gt;
          &lt;/(l1)tok&gt;
        &lt;/(l2)s&gt;
      &lt;/(l2)head&gt;
    &lt;!-- ... --&gt;
    &lt;/(l2)div&gt;&lt;/(l1)div&gt;
  &lt;/(l2)body&gt;&lt;/(l1)body&gt;
  &lt;!-- ... --&gt;
&lt;/(l2)tusneldaCorpus&gt;&lt;/(l1)tusneldaCorpus&gt;
</pre>
                        </div>
		  
                     </p>
                  </div>
               </div>
               <div class="subsec1">
                  <h3>
                     <a name="sec-xconcur-cl"/>Validation</h3>
                  <p>As mentioned in section <a href="#sec-syntax">5-1</a>, XCONCUR
		  allows the use of annotation schemas for the validation of each
		  annotation layer. However, to examine each tree by itself (a
		  "tree-by-tree" approach) is not sufficient to validate multiple
		  hierarchies in a proper manner.<sup>
                        <span class="highlight">
                           <a href="#tod0e669" name="fromd0e669">6</a>
                        </span>
                     </sup> XCONCUR-CL, the validation component in
		  XCONCUR, is a constraint-based, cross-layer validation approach for
		  the validation of multiple hierarchies. A constraint schema consists
		  of a set of rules which define relations between an arbitrary
		  number of layers in an XCONCUR document. Initial and related work
		  was presented by <b>
                        <span style="font-size:85%">
                           <a href="#Schonefeld2006" name="fromSchonefeld2006">[Schonefeld &amp; Witt (2006)]</a>
                        </span>
                     </b> and <b>
                        <span style="font-size:85%">
                           <a href="#Schonefeld2007" name="fromSchonefeld2007">[Schonefeld (2007)]</a>
                        </span>
                     </b>. Different approaches to the validation of
		  conncurrent markup are described by <b>
                        <span style="font-size:85%">
                           <a href="#SpergbergMcQueen2006" name="fromSpergbergMcQueen2006">[Sperberg-McQueen (2006)]</a>
                        </span>
                     </b> and <b>
                        <span style="font-size:85%">
                           <a href="#Tennison2007" name="fromTennison2007">[Tennison (2007)]</a>
                        </span>
                     </b>.
		</p>
                  <div class="subsec2">
                     <h4>
                        <a name="sec-xconcur-cl-building-block"/>Basic constraint expressions</h4>
                     <p>A basic constraint expression serves as the most fundamental
			building block of a rule in an XCONCUR-CL schema. The general form
			is <tt class="code">operand operator operand</tt>. The operand allows the
			user to refer to the elements in an XCONCUR document while the
			operator defines the relation between these elements. The operand
			takes a parametrized generic identifier (pGI) as an argument that
			does not contain a literal annotation layer id, but an annotation
			layer id variable. This variable will be bound to a literal
			annotation layer id using the constraint processing instruction in
			an XCONCUR instance. All annotation layer id variables that occur
			in a constraint set have to be bound. This approach allows an
			author to write more generic and reusable constraint sets.</p>
                     <p>The following basic operands and operators are defined:
			<h4>Operands</h4>
                        <table border="0" cellpadding="8" class="deflist">
                           <tr>
                              <td valign="top">start[<i>pGI</i>]</td>
                              <td valign="top">
                                 <p class="first">The <i>start</i>
					operand denotes the <i>start
					  tag</i> of an element identified by pGI.</p>
                              </td>
                           </tr>
                           <tr>
                              <td valign="top">end[<i>pGI</i>]</td>
                              <td valign="top">
                                 <p class="first">The <i>end</i>
					operand denotes the <i>end
					  tag</i> of an element identified by pGI.</p>
                              </td>
                           </tr>
                           <tr>
                              <td valign="top">element[<i>pGI</i>]</td>
                              <td valign="top">
                                 <p class="first">The <i>element</i>
					operand denotes the <i>element
					  tag</i> pGI.</p>
                              </td>
                           </tr>
                        </table>
			
                        <h4>Operators</h4>
                        <table border="0" cellpadding="8" class="deflist">
                           <tr>
                              <td valign="top">op<sub>a</sub>&#160;&lt;&lt;&#160;op<sub>b</sub>
                              </td>
                              <td valign="top">
                                 <p class="first">The <i>precedes</i>
					operator: the position of entity op<sub>a</sub> precedes
					the position of entity op<sub>b</sub>. Entities
					op<sub>a</sub> and op<sub>b</sub> may be the <i>start</i> or <i>end</i> operand. The precedes
					operator realizes a strict precede relation and <i>not</i> a precedes-or-equals
					relation.</p>
                              </td>
                           </tr>
                           <tr>
                              <td valign="top">op<sub>a</sub>&#160;==&#160;op<sub>b</sub>
                              </td>
                              <td valign="top">
                                 <p class="first">The <i>equals</i>
					operator: the position of entity op<sub>a</sub> equals the
					position of entity op<sub>b</sub>. Entities op<sub>a</sub>
					and op<sub>b</sub> may be the <i>start</i> or <i>end</i> operand.</p>
                              </td>
                           </tr>
                        </table>
		  
                     </p>
                     <p>Each basic constraint expression will either evaluate to
			<i>true</i> or <i>false</i>. While evaluating the expression,
			the start and end operands will evaluate to a position of the
			corresponding tag. This position is an ordinal value defined as
			the offset into the primary data where this element occurs in the
			document. For non-empty elements the start
			position is always less than the end position,
			and for empty elements both positions are
			always equal.
		  </p>
                     <p>Basic constraint expressions can be connected using logical
			operators and explicit grouping to form more powerful expressions.
			<table border="0" cellpadding="8" class="deflist">
                           <tr>
                              <td valign="top">exp<sub>a</sub>&#160;&amp;&amp;&#160;exp<sub>b</sub>
                              </td>
                              <td valign="top">
                                 <p class="first">
                                    <i>Logical
					  conjunction</i>: the whole expression evaluates
					to true, if each of the subexpressions exp<sub>a</sub> and
					exp<sub>b</sub> evaluate to true.</p>
                              </td>
                           </tr>
                           <tr>
                              <td valign="top">exp<sub>a</sub>&#160;||&#160;exp<sub>b</sub>
                              </td>
                              <td valign="top">
                                 <p class="first">
                                    <i>Logical
				disjunction</i>: the whole expression evaluates to
				true, if any of the subexpressions exp<sub>a</sub> and
				exp<sub>b</sub> evaluates to true.</p>
                              </td>
                           </tr>
                           <tr>
                              <td valign="top">!exp</td>
                              <td valign="top">
                                 <p class="first">
                                    <i>Negation</i>: the whole expression
					  evaluates to true, if the expression exp evaluates to
					  false.</p>
                              </td>
                           </tr>
                        </table>
		  
                     </p>
                     <p>The following table defines the precedences of the different
			components of basic constraint expressions and the logical
			operators. An XCONCUR-CL-aware parser should use them to build a
			correct representation of the constraint rules that are used to
			validate an XCONCUR document.</p>
                     <div class="table">
                        <h5>Table 1</h5>
                        <table border="1" width="85%">
                           <colgroup>
                              <col align="center"/>
                              <col align="center"/>
                              <col align="left"/>
                           </colgroup>
                           <thead>
                              <tr>
                                 <th align="center">Precedence</th>
                                 <th align="center">Operator</th>
                                 <th align="center">Comment</th>
                              </tr>
                           </thead>
                           <tbody>
				

                              <tr>
                                 <td>1</td>
                                 <td>== &lt;&lt; </td>
                                 <td>applies to derived operators as well</td>
                              </tr>
				

                              <tr>
                                 <td>2</td>
                                 <td>(expression)</td>
                                 <td>explicit grouping</td>
                              </tr>
				

                              <tr>
                                 <td>3</td>
                                 <td>!</td>
                                 <td>negation</td>
                              </tr>
				

                              <tr>
                                 <td>4</td>
                                 <td>&amp;&amp;</td>
                                 <td>conjunction</td>
                              </tr>
				

                              <tr>
                                 <td>5</td>
                                 <td>||</td>
                                 <td>disjunction</td>
                              </tr>
			  
                           </tbody>
                        </table>
                     </div>
                  </div>
                  <div class="subsec2">
                     <h4>
                        <a name="t5-2-2"/>Common derived operators</h4>
                     <p>Using the basic constraint expression and combining them using
			the logical operators can quickly lead to complex and practically
			unreadable constraint expressions. To countervail this fact, we
			defined a set of the most commonly used operators. The <i>common derived operators</i> are solely
			defined in terms of the basic operators combined by utilizing the
			logical operators.</p>
                     <p>
			The set contains the following operators:
			<table border="0" cellpadding="8" class="deflist">
                           <tr>
                              <td valign="top">op<sub>a</sub>&#160;&gt;&gt;&#160;op<sub>b</sub>
                              </td>
                              <td valign="top">
                                 <p class="first">The <i>succeeds</i>
					operator: the position of entity op<sub>a</sub> succeeds
					the position of entity op<sub>b</sub>. Entities
					op<sub>a</sub> and op<sub>b</sub> may be the <i>start</i> or <i>end</i> operand.</p>
                                 <p>!(op<sub>a</sub>&#160;&lt;&lt;&#160;op<sub>b</sub>&#160;||&#160;op<sub>a</sub>&#160;==&#160;op<sub>b</sub>)</p>
                              </td>
                           </tr>
                           <tr>
                              <td valign="top">op<sub>a</sub>&#160;&lt;=&#160;op<sub>b</sub>
                              </td>
                              <td valign="top">
                                 <p class="first">The <i>precedes-or-equals</i> operator:
					the position of entity op<sub>a</sub> precedes or equals
					the position of entity op<sub>b</sub>. Entities
					op<sub>a</sub> and op<sub>b</sub> may be the <i>start</i> or <i>end</i> operand.</p>
                                 <p>op<sub>a</sub>&#160;&lt;&lt;&#160;op<sub>b</sub>&#160;||&#160;op<sub>a</sub>&#160;==&#160;op<sub>b</sub>
                                 </p>
                              </td>
                           </tr>
                           <tr>
                              <td valign="top">op<sub>a</sub>&#160;=&gt;&#160;op<sub>b</sub>
                              </td>
                              <td valign="top">
                                 <p class="first">The <i>succeeds-or-equals</i> operator: the position
					of entity op<sub>a</sub> succeeds or equals the position of entity
					op<sub>b</sub>. Entities op<sub>a</sub> and op<sub>b</sub> may be
					the <i>start</i> or <i>end</i> operand.</p>
                                 <p>!op<sub>a</sub>&#160;&lt;&lt;&#160;op<sub>b</sub>&#160;||&#160;op<sub>a</sub>&#160;==&#160;op<sub>b</sub>
                                 </p>
                              </td>
                           </tr>
                           <tr>
                              <td valign="top">op<sub>a</sub>&#160;[]&#160;op<sub>b</sub>
                              </td>
                              <td valign="top">
                                 <p class="first">The <i>inside</i> or
					<i>inclusion</i> operator:
					entity op<sub>a</sub> is contained inside entity
					op<sub>b</sub>. Entity op<sub>a</sub> may be the
					<i>start</i>, <i>end</i> or <i>element</i> operand and entity
					op<sub>b</sub> must be an <i>element</i> operand.</p>
                                 <p>start(op<sub>b</sub>)&#160;&lt;&lt;&#160;op<sub>a</sub>&#160;&amp;&amp;&#160;op<sub>a</sub>&#160;&lt;&lt;&#160;end(op<sub>b</sub>)&#160;|&#160;op<sub>a</sub>
					in {start, end}</p>
                                 <p>start(op<sub>b</sub>)&#160;&lt;&lt;&#160;start(op<sub>a</sub>)&#160;&amp;&amp;&#160;end(op<sub>a</sub>)&#160;&lt;&lt;&#160;end(op<sub>b</sub>)&#160;|&#160;op<sub>a</sub>
					in {element}</p>
                              </td>
                           </tr>
                           <tr>
                              <td valign="top">op<sub>a</sub>&#160;][&#160;op<sub>b</sub>
                              </td>
                              <td valign="top">
                                 <p class="first">The <i>outside</i> or
					<i>independence</i> operator: entity
					op<sub>a</sub> is not enclosed within the range between the start
					and end tag of entity op<sub>b</sub>. Entity op<sub>a</sub> may be
					the <i>start</i>, <i>end</i> or <i>element</i> operand and entity op<sub>b</sub>
					must be an <i>element</i>
					operand.</p>
                                 <p>!(start(op<sub>b</sub>)&#160;&lt;&lt;&#160;op<sub>a</sub>&#160;&amp;&amp;&#160;op<sub>a</sub>&#160;&lt;&lt;&#160;end(op<sub>b</sub>))&#160;|&#160;op<sub>a</sub>
					in {start, end}</p>
                                 <p>!(start(op<sub>b</sub>)&#160;&lt;&lt;&#160;start(op<sub>a</sub>)&#160;&amp;&amp;&#160;end(op<sub>a</sub>)&#160;&lt;&lt;&#160;end(op<sub>b</sub>))&#160;|&#160;op<sub>a</sub>
					in {element}</p>
                              </td>
                           </tr>
                           <tr>
                              <td valign="top">op<sub>a</sub>&#160;//&#160;op<sub>b</sub>
                              </td>
                              <td valign="top">
                                 <p class="first">The <i>overlap</i>
					operator: entity op<sub>a</sub> overlaps with the range between the
					start and end tag of entity op<sub>b</sub>. Entities op<sub>a</sub>
					and entity op<sub>b</sub> must be an <i>element</i> operand.</p>
                                 <p>
					(start(op<sub>a</sub>)&#160;&lt;&lt;&#160;start(op<sub>b</sub>)&#160;&amp;&amp;&#160;start(op<sub>b</sub>)&#160;&lt;&lt;&#160;end(op<sub>a</sub>)&#160;&amp;&amp;&#160;end(op<sub>a</sub>)&#160;&lt;&lt;&#160;end(op<sub>b</sub>))&#160;|| (start(op<sub>b</sub>)&#160;&lt;&lt;&#160;start(op<sub>a</sub>)&#160;&amp;&amp;&#160;start(op<sub>a</sub>)&#160;&lt;&lt;&#160;end(op<sub>b</sub>)&#160;&amp;&amp;&#160;end(op<sub>b</sub>)&#160;&lt;&lt;&#160;end(op<sub>a</sub>))
				  </p>
                              </td>
                           </tr>
                        </table>
		  
                     </p>
                  </div>
                  <div class="subsec2">
                     <h4>
                        <a name="t5-2-3"/>Rule Evaluation</h4>
                     <p>Each constraint expression is evaluated in a specific
			context. To define a context, an addressing language such as XPath
			could be used. However, XPath would need to be extended to deal
			with multiple hierarchies. <b>
                           <span style="font-size:85%">
                              <a href="#Alink2006" name="fromAlink2006">[Alink et al. (2006)]</a>
                           </span>
                        </b> and
			<b>
                           <span style="font-size:85%">
                              <a href="#Eckart2007" name="fromEckart2007">[Eckart &amp; Teich (2007)]</a>
                           </span>
                        </b> propose new axes to XPath for
			this purpose, but for the purpose of simplicity XCONCUR-CL
			currently uses an approach adopted from the selectors in Cascading
			Style Sheets (see <b>
                           <span style="font-size:85%">
                              <a href="#Css2006" name="fromCss2006">[Bos et al. (2006)]</a>
                           </span>
                        </b>).
		  </p>
                     <p>An element in an XCONCUR document can be seen as a range over
			the primary data, which is similiar to the approach used in LMNL
			(<b>
                           <span style="font-size:85%">
                              <a href="#Tennison2002" name="fromTennison2002">[Tennison &amp; Piez (2002)]</a>
                           </span>
                        </b>). The context of a constraint
			expression selects an element and spans a range over the primary
			data. The constraint expression is evaluated for all elements
			inside this range. If no specific element is chosen the <i>universal context</i>, denoted as
			<tt class="code">*</tt>, can be used.
		  </p>
                     <p>Figure <a href="#fig-xconcur-words-doc">9</a> shows an excerpt
			of an XCONCUR document that is annotated on three different
			levels: a syntactic layer (sentences, words, using the annotation
			layer id <tt class="code">s</tt>), a morphological layer (morphemes,
			annotation layer id <tt class="code">m</tt>) and a phonological layer
			(syllables, annotation layer id <tt class="code">p</tt>. The excerpt only
			shows the annotation of a single word ("tables"). Morphemes and
			words are annotated on distinct layers, since they may or may not
			overlap. However, both are always included inside of words. A set
			of constraints expressions is given in figure <a href="#fig-xconcur-words-cl">10</a>
		  
                     </p>
                     <div class="figure">
                        <a name="fig-xconcur-words-doc"/>
                        <h5>Figure 9: Excerpt of a verbose XCONCUR annotation of
				words, morphemes and syllables.</h5>
                        <p>

                           <div class="codeblock">
                              <pre>&lt;!-- ... --&gt;
&lt;(s)sentence&gt;
  &lt;!-- ... --&gt;
  &lt;(s)word&gt;
    &lt;(m)morph&gt;&lt;(p)syll&gt;ta&lt;/(p)syll&gt;&lt;(p)syll&gt;ble&lt;/(m)morph&gt;&lt;(m)morph&gt;s&lt;/(p)syll&gt;&lt;/(m)morph&gt;
  &lt;/(s)word&gt;
  &lt;!-- ... --&gt;
&lt;/(s)sentence&gt;
&lt;!-- ... --&gt;</pre>
                           </div>
                        </p>
                     </div>
                     <div class="figure">
                        <a name="fig-xconcur-words-cl"/>
                        <h5>Figure 10: A set of constraint expressions for the
	    XCONCUR document shown in figure <a href="#fig-xconcur-words-doc">9</a>.</h5>
                        <p>

                           <div class="codeblock">
                              <pre># rule 1
($S)word {
  (element[($P)syll] [] element[self]) ||
  (start[self] &lt;= start[($P)syll] &amp;&amp; end[($P)syll] &lt;= end[self])
} assert

# rule 2
($S)word {
  (element[($M)morph] [] element[self]) ||
  (start[($M)morph] == start[self] &amp;&amp; end[($M)morph] == end[self]) ||
  (start[($M)morph] == start[self] &amp;&amp; end[($M)morph] [] element[self]) ||
  (start[($M)morph] [] element[self] &amp;&amp; end[($M)morph] == end[self])
} assert

# rule 3
($)word [
  element[($P)syll] // element[($M)morph]
} optional

# rule 4
* {
  element[($P)syll ][ element[($S)word]
} reject

# rule 5
* {
  element[($M)morph ][ element[($S)word]
} reject
</pre>
                           </div>
			
                        </p>
                     </div>
                     <p>Rule 1 asserts that a <tt class="code">syll</tt> element must be
			contained inside a <tt class="code">word</tt> element or both elements must
			be equal. The use of <tt class="code">self</tt> as a pGI substitutes the
			selected context element here. This is used to make rules less
			ambiguous.<sup>
                           <span class="highlight">
                              <a href="#tod0e1355" name="fromd0e1355">7</a>
                           </span>
                        </sup> The rule has to be interpreted as
			follows: <i>each</i> element
			<tt class="code">syll</tt>, which falls into the range which is spanned by
			the context element <tt class="code">s</tt>, must either be completely
			included (first clause of the disjunction) or be of equal range,
			or share the start or end point with the context element (second
			clause). For each element matched by the context element, the rule
			will be evaluated; each time the rule is evaluated, a different
			set of elements is considered. Rule 2 is analogous to rule 1 but
			works on <tt class="code">s</tt> and <tt class="code">morpheme</tt> elements.
		  </p>
                     <p>Rule 3 declares overlap between <tt class="code">syll</tt> and
			<tt class="code">morph</tt> elements as optional in the context of
			<tt class="code">s</tt>. Using the negation operator, one could for example
			forbid overlaps between those elements. In that case, the elements
			<tt class="code">syll</tt> and <tt class="code">morph</tt> are not allowed to
			overlap, but otherwise could be in any possible relation.
		  </p>
                     <p>Rule 4 rejects the occurrence of <tt class="code">syll</tt> anywhere
			(by means of the universal context) in the document, but inside
			elements. It is to be read as: if <i>any</i> element <tt class="code">syll</tt> is not
			contained inside the range of a <tt class="code">word</tt> element, reject
			the document. Together, rules 1 and 4 ensure that
			<tt class="code">syll</tt> elements may only occur inside of
			<tt class="code">word</tt> elements. Rule 5 implements a similar behavior
			for <tt class="code">morph</tt> elements.
		  </p>
                  </div>
                  <div class="subsec2">
                     <h4>
                        <a name="t5-2-4"/>Compact Syntax</h4>
                     <p>A compact syntax for XCONCUR-CL is given in figure <a href="#fig-compact-syntax">11</a> as an EBNF grammar. The notation
			  is similar to the one used in <b>
                           <span style="font-size:85%">
                              <a href="#Xml2006" name="fromXml2006">[Bray et al. (2006a)]</a>
                           </span>
                        </b>. Similar to RelaxNG, this syntax serves as
			  compact syntax for XCONCUR-CL. An additional XML representation
			  will be provided in the future.<sup>
                           <span class="highlight">
                              <a href="#tod0e1432" name="fromd0e1432">8</a>
                           </span>
                        </sup>
                     </p>
                     <div class="figure">
                        <a name="fig-compact-syntax"/>
                        <h5>Figure 11: 
			  A Compact syntax for XCONCUR-CL.<sup>
                              <span class="highlight">
                                 <a href="#tod0e1440" name="fromd0e1440">9</a>
                              </span>
                           </sup>
			
                        </h5>
                        <p>
	      

                           <div class="codeblock">
                              <pre>
  constraint-set ::= comment* rule (rule | comment)*
            rule ::= context "{" expression "}" rule-modifier?
         comment ::= "#" Any character valid for comments
         context ::= "*" | tag-identifier (tag-identifier)*
   rule-modifier ::= "assert" | "reject" | "optional"
      expression ::= basic-expression
                     | expression connector expression
                     | "!" expression
                     | "(" expression ")"
basic-expression ::= operand operator operand
       connector ::= "&amp;&amp;" | "||"
        operator ::= basic-operator | derived-operator
  basic-operator ::= "&lt;&lt;" | "=="
derived-operator ::= "&gt;&gt;" | "&lt;=" | "=&gt;" | "[]" | "][" | "//"
         operand ::= ("start" | "end" | "element")
                     "[" (tag-identifier | "self" ) "]"
  tag-identifier ::= "(" layer-variable ")" element-name
  layer-variable ::= "$" [A-Z] ([A-Z] | [0-9])*
    element-name ::= A QName as defined in Bray et al. (2006b)</pre>
                           </div>
	  
                        </p>
                     </div>
                  </div>
               </div>
            </div>
            <div class="section">
               <h2>
                  <a name="t6"/>GENAU and XCONCUR</h2>
               <p>The sections <a href="#SingleRootedTrees">&#8220;Transforming Single Rooted Trees&#8221;</a> and <a href="#TransformingAnnotationGraphs">&#8220;Transforming Annotation Graphs&#8221;</a> demonstrate how different types
      of linguistic corpora can be transformed to a set of separately
      annotated, primary-data-identical XML documents.  These XML documents
      serve as the GENAU format. Since the XCONCUR document format can be
      understood as an interwoven set primary-data-identical XML documents, a
      transformation from GENAU to XCONCUR can be done trivially and is a
      lossless operation. Furthermore, an XCONCUR document can again be split
      up and transformed back via GENAU to its original representation &#8211;
      or even a different one.</p>
               <p>The XCONCUR representation allows for the utilisation of XCONCUR-CL to
		formulate constraint rules, which express the relations between the
		elements of different trees in the multi-rooted-trees data model. On
		the one hand, this serves as a mechanism to validate the markup and
		can improve corpora quality. On the other hand, these constraint rules
		allow for the formulation of principles which are the result of a deeper
		analysis.</p>
               <p>
		Figure <a href="#fig-tusnelda-reprise">12</a> shows an
		excerpt of figure <a href="#fig-xconcur-example">8</a>. The following constraint rules
		can be applied to ensure that <tt class="code">tok</tt> elements span
		over equal ranges:
<div class="codeblock">
                     <pre>($L1)body {
  start[($L1)tok] == start[($L2)tok] &amp;&amp; end[($L1)tok] == end[($L2)tok]
} assert

($L2)body {
  start[($L1)tok] == start[($L2)tok] &amp;&amp; end[($L1)tok] == end[($L2)tok]
} assert</pre>
                  </div>
	The same holds for the <tt class="code">body</tt> elements, but the
	constraint rules have been omitted, since they are analogous
	to the other rules.
      </p>
               <div class="figure">
                  <a name="fig-tusnelda-reprise"/>
                  <h5>Figure 12: Excerpt from the Uppsala corpus (see figure <a href="#fig-xconcur-example">8</a>)</h5>
                  <pre>&lt;!-- ... --&gt;
&lt;(l1)tok id="SGID0201.1.1" n="1"&gt;
  &lt;(l2)tok id="SGID0201.1.1" n="1"&gt;
    &lt;(l1)orth&gt;
      &lt;(l2)orth&gt;Kakoj&lt;/(l2)orth&gt;
    &lt;/(l1)orth&gt;
    &lt;(l1)pos leveler:text="pronoun"/&gt;
    &lt;(l1)desc&gt;
      &lt;(l1)feature type="subpos" leveler:text="interrogative"/&gt;
      &lt;(l1)feature type="tag" leveler:text="pronomen_int_nom_sg_masc_adj"/&gt;
      &lt;(l1)feature type="syntactic type" leveler:text="adjectival"/&gt;
      &lt;(l1)lemma leveler:text="kakoj"/&gt;
      &lt;(l1)case leveler:text="nominative"/&gt;
      &lt;(l1)gender leveler:text="masculine"/&gt;
      &lt;(l1)number leveler:text="singular"/&gt;
    &lt;/(l1)desc&gt;
  &lt;/(l2)tok&gt;
&lt;/(l1)tok&gt;
&lt;!-- ... --&gt;</pre>
               </div>
               <p>Likewise, similar constraints can be applied to an XCONCUR
      representation of an EXMARaLDA document. An example of an EXMARaLDA
      document that has been converted to XCONCUR is shown in figure <a href="#fig-exmeralda-example">13</a>. Similar constraints, such as those
      from the previous example, can be applied. For example, one might want
      to assert that <tt class="code">orth</tt> elements should span over the same
      range as the <tt class="code">syll</tt> elements.
      </p>
               <div class="figure">
                  <a name="fig-exmeralda-example"/>
                  <h5>Figure 13: An excerpt of the XCONCUR version of the E3 corpus (SFB 538, Hamburg University)</h5>
                  <pre>&lt;?xconcur version="1.1" encoding="utf-8"?&gt;
&lt;?xconcur-schema layer-id="l1" root="body" system="exmeralda1.dtd"?&gt;
&lt;?xconcur-schema layer-id="l2" root="body" system="exmeralda2.dtd"?&gt;
&lt;(l1)body&gt;&lt;(l2)body&gt;
  &lt;(l1)orth id="TIE1.a0" s="TLI0" e="TLI1" value="S&#237;."&gt;
    &lt;(l2)syll id="TIE2.a0" s="TLI0" e="TLI1" value="[CV]"&gt;
      &lt;(l1)ts number="1" speaker="CHI" tier="TIE0" n="sc" id="TIE0.sc0" s="TLI0" e="TLI1"&gt;
        &lt;(l2)ts number="1" speaker="CHI" tier="TIE0" n="sc" id="TIE0.sc0" s="TLI0" e="TLI1"&gt;
          &lt;(l1)ts n="e" id="TIE0.e0" s="TLI0" e="TLI1"&gt;
            &lt;(l2)ts n="e" id="TIE0.e0" s="TLI0" e="TLI1"&gt;[d&#643;&#618;]&lt;/(l2)ts&gt;
          &lt;/(l1)ts&gt;
        &lt;/(l2)ts&gt;
      &lt;/(l1)ts&gt;
    &lt;/(l2)syll&gt;
  &lt;/(l1)orth&gt;
  &lt;(l1)orth id="TIE1.a1" s="TLI2" e="TLI3" value="Mira."&gt;
    &lt;(l2)syll id="TIE2.a1" s="TLI2" e="TLI3" value="[CV.CV:]"&gt;
         &lt;(l1)ts number="2" speaker="CHI" tier="TIE0" n="sc" id="TIE0.sc1" s="TLI2" e="TLI3"&gt;
           &lt;(l2)ts number="2" speaker="CHI" tier="TIE0" n="sc" id="TIE0.sc1" s="TLI2" e="TLI3"&gt;
              &lt;(l1)ts n="e" id="TIE0.e1" s="TLI2" e="TLI3"&gt;
                 &lt;(l2)ts n="e" id="TIE0.e1" s="TLI2" e="TLI3"&gt;[mi.&#633;&#592;&#720;]&lt;/(l2)ts&gt;
              &lt;/(l1)ts&gt;
           &lt;/(l2)ts&gt;
         &lt;/(l1)ts&gt;
     &lt;/(l2)syll&gt;
   &lt;/(l1)orth&gt;
   &lt;!-- ... --&gt;
&lt;/(l2)body&gt;&lt;/(l1)body&gt;</pre>
               </div>
            </div>
            <div class="section">
               <h2>
                  <a name="t7"/>Conclusion</h2>
               <p>
		This paper presents an approach for the conversion of complex
		linguistic resources into a generalized data representation. This
		representation can be transformed into an XCONCUR document, so that
		XCONCUR-CL rules can be used to control not only the validity of each
		single annotation layer but also to enforce constraints with regard to
		the interaction of different layers.
	</p>
               <p>
		The data we have analysed in the project so far is structured in such
		a way, that most relations between annotation layers belong to the
		"equals" relation. We did not observe any kind of overlap as a large
		part of the data (i.e., the corpora annotated in the Tusnelda format)
		originates from single-layered XML documents. In addition, the
		EXMARaLDA data does not include information from multiple linguistic
		levels of description. However, since we are interested in a general
		information modelling approach, we did not only focus on the overlap
		problem.  The option to define different relations between annotation
		layers, or, to be precise, between trees, leads to an increase in
		quality, since the document may now not only be validated from a
		single perspective. As a result, a new class of erroneous data can be
		detected.
	</p>
               <p>
		If a new information model for a corpus is to be designed from
		scratch, one could, besides using or defining the appropriate
		annotation schemata for each annotation layer, apply the full
		expressiveness of XCONCUR-CL to model the relations between the
		different trees in the multi-rooted data model of XCONCUR.
	</p>
            </div>
            <h3 class="footnotes">Notes</h3>
            <table class="footnotes">
               <tr>
                  <td class="ftnote-num" valign="top" width="10%" align="right">
                     <p class="first">
                        <a href="#fromd0e82" name="tod0e82">
                           <b>1.</b>
                        </a>
                     </p>
                  </td>
                  <td valign="top">
                     <p class="first">Indeed, linguistic data has received attention by the markup
        community for many years now (e.g., <b>
                           <span style="font-size:85%">
                              <a href="#p3" name="fromp3">[Sperberg-McQueen &amp; Burnard (1994)]</a>
                           </span>
                        </b>, <b>
                           <span style="font-size:85%">
                              <a href="#Witt1998" name="fromWitt1998">[Witt (1998)]</a>
                           </span>
                        </b>, <b>
                           <span style="font-size:85%">
                              <a href="#Rehm1999" name="fromRehm1999">[Rehm (1999)]</a>
                           </span>
                        </b>)</p>
                  </td>
               </tr>
               <tr>
                  <td class="ftnote-num" valign="top" width="10%" align="right">
                     <p class="first">
                        <a href="#fromd0e385" name="tod0e385">
                           <b>2.</b>
                        </a>
                     </p>
                  </td>
                  <td valign="top">
                     <p class="first">
              
                        <a href="http://coli.lili.uni-bielefeld.de/Texttechnologie/Forschergruppe/Phase1/sekimo/python/" target="_blank">http://coli.lili.uni-bielefeld.de/Texttechnologie/Forschergruppe/Phase1/sekimo/python/</a>
            
                     </p>
                  </td>
               </tr>
               <tr>
                  <td class="ftnote-num" valign="top" width="10%" align="right">
                     <p class="first">
                        <a href="#fromd0e573" name="tod0e573">
                           <b>3.</b>
                        </a>
                     </p>
                  </td>
                  <td valign="top">
                     <p class="first">Two documents are considered to
      have identical primary data if one can remove all annotations (markup)
      and obtain the same sequence of characters.</p>
                  </td>
               </tr>
               <tr>
                  <td class="ftnote-num" valign="top" width="10%" align="right">
                     <p class="first">
                        <a href="#fromd0e615" name="tod0e615">
                           <b>4.</b>
                        </a>
                     </p>
                  </td>
                  <td valign="top">
                     <p class="first">DOCTYPE
		  declarations were used in earlier XCONCUR versions. As they are
		  specific to DTDs, they are considered deprecated.</p>
                  </td>
               </tr>
               <tr>
                  <td class="ftnote-num" valign="top" width="10%" align="right">
                     <p class="first">
                        <a href="#fromd0e642" name="tod0e642">
                           <b>5.</b>
                        </a>
                     </p>
                  </td>
                  <td valign="top">
                     <p class="first">Since the occurance of the
		  <tt class="code">orth</tt> element in a <tt class="code">tok</tt> element is optional,
		  a further optimization to reduce the size of the annotation could be
		  to encode the <tt class="code">orth</tt> element on a distinct layer
		  only.</p>
                  </td>
               </tr>
               <tr>
                  <td class="ftnote-num" valign="top" width="10%" align="right">
                     <p class="first">
                        <a href="#fromd0e669" name="tod0e669">
                           <b>6.</b>
                        </a>
                     </p>
                  </td>
                  <td valign="top">
                     <p class="first">For example consider the annotation of a
		  document on three annotation layers: "syntactic layer" (sentence,
		  words), "phonological layer" (syllables), "morphological layer"
		  (morphemes). One can assert that syllables and morphemes are always
		  contained in words, but they may or may not overlap. Therefore, a
		  cross-layer method is necessary for validation
		  proper.</p>
                  </td>
               </tr>
               <tr>
                  <td class="ftnote-num" valign="top" width="10%" align="right">
                     <p class="first">
                        <a href="#fromd0e1355" name="tod0e1355">
                           <b>7.</b>
                        </a>
                     </p>
                  </td>
                  <td valign="top">
                     <p class="first">For example consider
			that <tt class="code">($S)word</tt> is used instead of
			<tt class="code">self</tt>. It would be unclear if the <tt class="code">word</tt>
			element is the current context element or another word element
			which is inside of the range spanned by the context
			element.</p>
                  </td>
               </tr>
               <tr>
                  <td class="ftnote-num" valign="top" width="10%" align="right">
                     <p class="first">
                        <a href="#fromd0e1432" name="tod0e1432">
                           <b>8.</b>
                        </a>
                     </p>
                  </td>
                  <td valign="top">
                     <p class="first">The XML syntax for XCONCUR-CL defined
			  in <b>
                           <span style="font-size:85%">
                              <a href="#Schonefeld2006" name="fromSchonefeld2006">[Schonefeld &amp; Witt (2006)]</a>
                           </span>
                        </b> is deprecated, since it
			  will not handle the new features of
			  XCONCUR-CL.</p>
                  </td>
               </tr>
               <tr>
                  <td class="ftnote-num" valign="top" width="10%" align="right">
                     <p class="first">
                        <a href="#fromd0e1440" name="tod0e1440">
                           <b>9.</b>
                        </a>
                     </p>
                  </td>
                  <td valign="top">
                     <p class="first">Here, the rules for
				  <tt class="code">comment</tt> and <tt class="code">element-name</tt> are
				  defined in prose rather than formally.</p>
                  </td>
               </tr>
            </table>
            <hr class="hr"/>
            <h3>
               <i>Bibliography</i>
            </h3>
            <p>
               <b>
                  <a name="Alink2006" href="#fromAlink2006">[Alink et al. (2006)] </a>
               </b> Wouter Alink, Valentin Jijoun, David Ahn, Maarten de
          Rijke, Peter Boncz, Arjen der Vries: <i>Representing and Querying Multi-Dimensional Markup
            for Question Answering</i>, In: Proceedings of the 5th
          Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional
          Markup in Natural Language Processing, Trento, 2006.</p>
            <p>
               <b>
                  <a name="Bird_Liberman_01" href="#fromBird_Liberman_01">[Bird &amp; Liberman (2001)] </a>
               </b> Steven Bird, Marc Liberman:
        <i>A Formal Framework for Linguistic
        Annotation</i>.  In: Speech Communication 33,1/2, 2001.</p>
            <p>
               <b>
                  <a name="Css2006" href="#fromCss2006">[Bos et al. (2006)] </a>
               </b> Bert Bos, Tantek Celik, Ian Hickson, Haakon Wium Lie:
          <i>Cascading Style Sheets, Level 2
            Revision 1</i>, World Wide Web Consortium, 2006.</p>
            <p>
               <b>
                  <a name="XmlNamespaces2006" href="#fromXmlNamespaces2006">[Bray at el. (2006b)] </a>
               </b> Tim Bray, Dave Hollander, Andrew
         Layman, Richard Tobin: <i>Namespaces in XML
         1.1</i>, World Wide Web Consortium, 2006.</p>
            <p>
               <b>
                  <a name="Xml2006" href="#fromXml2006">[Bray et al. (2006a)] </a>
               </b> Tim Bray, Jean Paoli, C. M. Sperberg-McQueen,
          Eve Maler, Francois Yergeau, John Cowan:
          <i>Extensible Markup Language
            (XML) 1.1</i>. World Wide Web Consortium,
          2006, 2nd edition.</p>
            <p>
               <b>
                  <a name="Carletta_et_al2003" href="#fromCarletta_et_al2003">[Carletta et al. (2003)] </a>
               </b> Jean Carletta, Jonathan
		Kilgour, Tim O'Donnell, Stefan Evert, and Holger Voormann: <i>The NITE Object Model Library for Handling Structured
		Linguistic Annotation on Multimodal Data Sets</i>. In:
		Proceedings of the EACL Workshop on Language Technology and the
		Semantic Web (3rd Workshop on NLP and XML, NLPXML-2003), 2003.
        </p>
            <p>
               <b>
                  <a name="Chiarcos2007" href="#fromChiarcos2007">[Chiarcos (2007)] </a>
               </b> Christian Chiarcos: <i>An Ontology of Linguistic Annotation: Word Classes and
        Morphology</i>. In: Proceedings of DIALOG 2007, Toronto,
        2007.</p>
            <p>
               <b>
                  <a name="Dipper2006" href="#fromDipper2006">[Dipper et al. (2006)] </a>
               </b> Stefanie Dipper, Erhard Hinrichs,
		  Thomas Schmidt, Andreas Wagner, Andreas Witt: <i>Sustainability of Linguistic Resources</i>.
		  In: Erhard Hinrichs, Nancy Ide, Martha Palmer, and James Pustejovsky
		  (eds.): Proceedings of the LREC 2006 Satellite Workshop on "Merging
		  and Layering Linguistic Information", Genoa, 2006.
		</p>
            <p>
               <b>
                  <a name="Eckart2007" href="#fromEckart2007">[Eckart &amp; Teich (2007)] </a>
               </b> Richard Eckart, Elke Teich:
        <i>An XML-Based Data Model for Flexible
        Representation and Query of Linguistically Interpreted
        Corpora</i>, In: Georg Rehm, Andreas Witt, Lothar Lemmnitzer
        (eds.), Data Structures for Linguistic Resources and Applications,
        Gunter Narr Verlag, T&#252;bingen, 2007. pp. 327&#8211;336.</p>
            <p>
               <b>
                  <a name="Goldfarb1990" href="#fromGoldfarb1990">[Goldfarb (1990)] </a>
               </b> Charles F. Goldfarb: <i>The SGML
          Handbook</i>. Clandon Press, Oxford, 1990.</p>
            <p>
               <b>
                  <a name="Hilbert2005" href="#fromHilbert2005">[Hilbert et al. (2005)] </a>
               </b> Mirco Hilbert, Oliver
        Schonefeld, Andreas Witt: <i>Making CONCUR
        work</i>.  In: Proceedings of Extreme Markup Languages,
        Montreal, 2005.</p>
            <p>
               <b>
                  <a name="wl2007" href="#fromwl2007">[Lehmberg &amp; W&#246;rner (to appear)] </a>
               </b> Timm Lehmberg, Kai
        W&#246;rner: <i>Annotation Standards</i>.  In:
        A. L&#252;deling and M. Kyt&#246;, Corpus Linguistics, HSK, de Gruyter,
        Berlin/New York, in press
        </p>
            <p>
               <b>
                  <a name="Loenngren1993" href="#fromLoenngren1993">[L&#246;nngren et al. (1993)] </a>
               </b> L&#246;nngren, Lennart (eds.):
		<i>Chastotnyj slovar' sovremennogo russkogo
		jazyka. (A Frequency Dictionary of Modern Russian. With a Summary in
		English.)</i>, Acta Universitatis Upsaliensis, Studia Slavica
		Upsaliensia 32, Uppsala, 1993</p>
            <p>
               <b>
                  <a name="Rehm1999" href="#fromRehm1999">[Rehm (1999)] </a>
               </b> Georg Rehm: <i>Automatische Textannotation: Ein SGML- und
        DSSSL-basierter Ansatz zur angewandten Textlinguistik</i>. In:
        H. Lobin (ed.): Text im digitalen Medium, Wiesbaden, Westdeutscher
        Verlag, 1999.</p>
            <p>
               <b>
                  <a name="Rehm2007" href="#fromRehm2007">[Rehm et al. (2007)] </a>
               </b> Georg Rehm, Richard Eckart,
        Christian Chiarcos: <i>An OWL- and XQuery-Based
        Mechanism for the Retrieval of Linguistic Patterns from
        XML-Corpora</i>.  In: Proceedings of Recent Advances in
        Natural Language Processing (RANLP 2007), Borovets, Bulgaria </p>
            <p>
               <b>
                  <a name="Renear_93" href="#fromRenear_93">[Renear et al. (1993)] </a>
               </b> Allen Renear, Elli Mylonas, David
        Durand: <i>Refining our Notion of What Text
        Really Is: The Problem of Overlapping
        Hierarchies</i>. <a href="http://www.stg.brown.edu/resources/stg/monographs/ohco.html" target="_blank">http://www.stg.brown.edu/resources/stg/monographs/ohco.html</a>,
        1993.</p>
            <p>
               <b>
                  <a name="Schmidt2001" href="#fromSchmidt2001">[Schmidt (2001)] </a>
               </b> Thomas Schmidt: <i>The Transcription System EXMARaLDA: An Application of the
        Annotation Graph Formalism as the Basis of a Database of Multilingual
        Spoken Discourse</i>. In: S. Bird, P. Buneman, M. Liberman:
        Proceedings of the IRCS Workshop on Linguistic Databases,
        2001. pp. 219&#8211;227.
        </p>
            <p>
               <b>
                  <a name="Schmidt2005" href="#fromSchmidt2005">[Schmidt (2005)] </a>
               </b> Thomas Schmidt: <i>Time-Based Data Models and the Text Encoding Initiative's
        Guidelines for Transcription of Speech.</i> In: Arbeiten
        zur Mehrsprachigkeit (Working Papers in Multilingualism), Serie B
        (62), Hamburg, 2005.
        </p>
            <p>
               <b>
                  <a name="Schmidt2006" href="#fromSchmidt2006">[Schmidt et al. (2006)] </a>
               </b> Thomas Schmidt, Christian
        Chiarcos, Timm Lehmberg, Georg Rehm, Andreas Witt, Erhard Hinrichs:
        <i>Avoiding Data Graveyards: From Heterogeneous
        Data Collected in Multiple Research Projects to Sustainable Linguistic
        Resources.</i>. In: Proceedings of the E-MELD workshop 2006,
        June, 22 2006, Ypsilanti.</p>
            <p>
               <b>
                  <a name="Schonefeld2006" href="#fromSchonefeld2006">[Schonefeld &amp; Witt (2006)] </a>
               </b> Oliver Schonefeld,
        Andreas Witt: <i>Towards Validation of Concurrent
        Markup</i>. In: Proceedings of Extreme Markup Languages,
        Montreal, 2006.</p>
            <p>
               <b>
                  <a name="Schonefeld2007" href="#fromSchonefeld2007">[Schonefeld (2007)] </a>
               </b> Oliver Schonefeld: <i>XCONCUR and XCONCUR-CL: A Constraint-Based Approach for
        the Validation of Concurrent Markup</i>. In: Georg Rehm,
        Andreas Witt, Lothar Lemmnitzer (eds.), Data Structures for Linguistic
        Resources and Applications, Gunter Narr Verlag, T&#252;bingen,
        2007. pp. 347&#8211;356.</p>
            <p>
               <b>
                  <a name="Sgml1986" href="#fromSgml1986">[SGML (1986)] </a>
               </b> ISO 8879:1986: <i>Text and Office
          Systems &#8211; Standard Generalized Markup Language
          (SGML)</i>. International Organization for
          Standardization, Geneva, 1986.</p>
            <p>
               <b>
                  <a name="p3" href="#fromp3">[Sperberg-McQueen &amp; Burnard (1994)] </a>
               </b> C. M. Sperberg-McQueen, Lou Burnard (eds):
          <i>Guidelines for Electronic Text Encoding 
            and Interchange (TEI P3)</i>. 
          Ed. C. M. Sperberg-McQueen and Lou Burnard. 
          Chicago, Oxford: Text Encoding Initiative, 1994. 
        </p>
            <p>
               <b>
                  <a name="SpergbergMcQueen2006" href="#fromSpergbergMcQueen2006">[Sperberg-McQueen (2006)] </a>
               </b> C. M. Sperberg-McQueen:
		  <i>Rabbit/Duck Grammars: A Validation Method
		  for Overlapping Structures</i>. In: Proceedings of Extreme
		  Markup Languages, Montreal, 2006.</p>
            <p>
               <b>
                  <a name="Tennison2002" href="#fromTennison2002">[Tennison &amp; Piez (2002)] </a>
               </b> Jeni Tennison, Wendell
		  Piez: <i>The Layered Markup and Annotation
		  Language (LMNL)</i>. In: Proceedings of Extreme Markup
		  Languages, Montreal, 2002.</p>
            <p>
               <b>
                  <a name="Tennison2007" href="#fromTennison2007">[Tennison (2007)] </a>
               </b> Jeni Tennison: <i>Creole: Validating Overlapping
			Markup</i>. In: Proceedings of XTech 2007, Paris,
			2007.</p>
            <p>
               <b>
                  <a name="Wagner2005" href="#fromWagner2005">[Wagner (2005)] </a>
               </b> Andreas Wagner: <i>Unity in Diversity: Integrating Differing Linguistic Data
        in TUSNELDA</i>. In: S. Dipper, M. G&#246;tze, M. Stede:
        Heterogeneity in Focus: Creating and Using Linguistic Databases, ISIS,
        Working Papers of the SFB 632, Potsdam, 2005.</p>
            <p>
               <b>
                  <a name="Witt1998" href="#fromWitt1998">[Witt (1998)] </a>
               </b> Andreas Witt: <i>TEI-based XML-Applications: Transcriptions</i>.
		  In: ALLC-ACH 1998, Joint Conference of the ALLC and ACH, Debrecen,
		  1998.</p>
            <p>
               <b>
                  <a name="Witt04" href="#fromWitt04">[Witt (2004)] </a>
               </b> Andreas Witt: <i>Multiple Hierarchies: New Aspects of an old
		Solution</i>. In: Proceedings of Extreme Markup Languages,
		Montreal, 2004
		</p>
            <p>
               <b>
                  <a name="Witt2005" href="#fromWitt2005">[Witt et al. (2005)] </a>
               </b> Andreas Witt, Daniela Goecke, Felix Sasaki, Harald
          L&#252;ngen: <i>Unification of XML Documents
            with Concurrent Markup</i>. In: Literary and Linguistic
          Computing 2005 20(1), 2005. pp. 103&#8211;116.</p>
            <p>
               <b>
                  <a name="Woerner2006" href="#fromWoerner2006">[W&#246;rner et al. (2006)] </a>
               </b> Kai W&#246;rner, Andreas Witt, Georg
        Rehm, Stefanie Dipper: <i>Modelling Linguistic
        Data Structures</i>.  In: Proceedings of Extreme Markup
        Languages, Montreal, 2006.</p>
            <hr class="hr"/>
            <hr class="hr"/>
            <p class="footertitle">On the Lossless Transformation of Single-File,
    Multi-Layer Annotations into Multi-Rooted Trees</p>
            <address>Andreas Witt [University of T&#252;bingen]</address>
            <address>Oliver Schonefeld [University of Bielefeld]</address>
            <address>Georg Rehm [University of T&#252;bingen]</address>
            <address>Jonathan Khoo [University of T&#252;bingen]</address>
            <address>Kilian Evang [University of T&#252;bingen]</address>
            <hr class="hr"/>
         </div>
      </div>
   </body>
</html>