A Logic Approach for MPEG-7 XML Document Queries

Peiya Liu
Amit Chakraborty
Liang H. Hsu


Many query languages are currently being proposed for specifying XML document retrievals. The expressive power and usefulness of these query languages is really based on their embedded formalisms and intended XML document applications. The emerging MPEG-7 multimedia standard uses XML Schema:Datatypes for multimedia content descriptions and has posed an interesting challenge to XML query language design for XML document retrievals. Most XML query language proposals have limitations in specifying queries for this type of XML documents. In this paper, we have identified some critical specification issues in MPEG-7 XML queries and propose an XML query language, MMDOC-QL with multimedia query constructs. MMDOC-QL is based on a logic formalism, called path predicate calculus. In this path predicate calculus, the atomic logic formulas are element predicates rather than relation predicates in relational calculus. In this path calculus query language, queries in this calculus are equivalent to finding all proofs to existential closure of logical assertions in the form of path predicates that the tree document elements must satisfy. Spatial, temporal and visual datatypes and relationships can also be described in this formalism for content retrieval.

Keywords: Querying; Datatyping

Peiya Liu

Dr. Peiya Liu is a senior member of technical staff at Multimedia Documentation Program, Siemens Corporate Research, Inc. He has many years of experience in applications of multimedia documents in industrial environments. His primary interests are in the areas of multimedia document authoring/processing/management, multimedia tools, innovative industrial applications and standards. He was one of program co-chairs for the first joint IEEE International Conference on Multimedia and Expo and has served as a program committee member in many conferences. He is currently the standards section editor of IEEE Multimedia and on the editorial board of Kluwer Journal of Multimedia Tools and Applications.

Amit Chakraborty

Amit Chakraborty received his B.Tech and M.Tech degrees in Electronics and Electrical Communication Engineering from the Indian Institute of Technology, Kharagpur, India in 1988 and 1990 respectively. He was awarded the Institute Silver Medal on both occasions for being ranked highest in the graduating class. He received his MS, M.Phil and Ph.D in Electrical Engineering from Yale University, respectively in 1991, 1993 and 1996. His Ph.D focused on using game theoretic methods for Image Segmentation with applications in Medical Imaging. Since early 1996 Dr. Chakraborty has been a Member of Technical Staff at Siemens Corporate Research, Princeton, New Jersey. He is an MPEG-7 expert and his research interests include computer vision, image analysis, medical imaging, video processing and segmentation, wavelets and other aspects of image processing related to multimedia systems.

Liang H. Hsu

Liang H. Hsu is a distinguished member of technical staff and Head of Multimedia Documentation Program, Siemens Corporate Research, Inc. Liang received his MS degree in computer science from University of Pittsburgh in 1983. Prior to joining Siemens, he worked for Digital Equipment Corporation as a field service support. His current interests include conversion of legacy documents into SGML?XML, SGML/XML-based document composition and hyperlinking for complex products, multimedia document delivery mechanisms, and browsing and navigation support for product service-related applications.

A Logic Approach for MPEG-7 XML Document Queries

Peiya Liu [Senior Member of Technical Staff; Siemens Corporate Research, Inc., Multimedia Documentation Program]
Amit Chakraborty [Member of Technical Staff; Siemens Corporate Research, Inc., Multimedia Documentation Program]
Liang H. Hsu [Manager and Distinguished Member of Technical Staff; Siemens Corporate Research, Inc., Multimedia Documentation Program]

Extreme Markup Languages 2001® (Montréal, Québec)

Copyright © 2001 Peiya Liu, Amit Chakraborty, and Liang H. Hsu. Reproduced with permission.


MPEG-7 is an emergent ISO/IEC standard and formally named as “Multimedia Content Description Interface”. Unlike the previous MPEG [MPEG Web Site] compression standards MPEG-1, MPEG-2 and MPEG-4, MPEG-7 aims to create a standard for describing the multimedia content to enable the integration of production, distribution and content access paradigm. This MPEG-7 standard uses XML Schema to describe multimedia objects such as video, audio images, etc. as spatial, temporal or visual XML datatypes. This type of multimedia XML documents may include descriptions about both static/spatial media (such as text, graphics, drawings, images, etc.) and time-based media (such as video, audio, animation, etc.). The content can be further organized into three major document structures: hierarchical, hyperlinked, and temporal/spatial structures. MPEG-7 poses many interesting challenges in designing XML query languages to cover different aspects of XML documents.

Many document query languages such as [SDQL 96], [XML-QL 99], [Lorel 00], [YATL 98], [XQL 98 ], recent W3C [XQuery], etc., have been proposed for document retrievals. However, these languages cannot adequately support MPEG-7 XML document queries due to limited expressive power about XML datatypes for specifying “intensional” data and relationships inside XML documents. This has limited the usage of query languages in XML document retrievals. An ideal XML query language should support different aspects of XML structures and datatypes.

In this paper, we have identified several critical issues in MPEG-7 XML query specifications. Namely, they are “intensional data and relationships specifications”, “document addressing specifications” and “co-occurence constraints specifications”. We tackle these issues by using a logic formalism, called Path Predicate Calculus [Liu2 00] with multimedia query constructs in our XML query language, MMDOC-QL, for specifying spatial and temporal relationships to support MPEG-7 XML document retrieval and modification. We intend to open up a new direction of research in establishing logic formalisms and XML datatype specifications in query language design.

There are several advantages to this approach. First, these critical issues are tackled within the same logic framework. In the past, two formalisms have often been used for describing query languages in relational models [Ullman 88] (1) algebraic formalism, called relational algebra, and (2) logic formalism, called relational calculus, including tuple relational calculus [Codd 72]and domain relational calculus [Piottee 78]. However, due to underlying data models being different from the document model, these formalisms for relational query languages could not be directly used as formalisms for XML query languages. Queries in this formalism are equivalent to finding all proofs to existential closure of logical assertions that document elements must satisfy. In Path Predicate Calculus, the atomic logic formulae are element predicates for asserting logic statements about document elements in a document tree. This paper will show that many spatial/temporal/visual operations can be expressed in such a logic formalism. The relational calculus is a special case of this logic form when applying to "flat" data-oriented documents and element predicates are degenerated into relational predicates as in relational models. Second, it provides "non-proceduribility" of document queries. Historically, calculus-based relational query languages are more prevalent than algebraic languages due to declarative characteristics of logic formalism. The algebraic approach, taken by W3C Query Working Group [XQuery] [XML Query Algebra], often needs to explicitly describe the order of operations on underlying data models to express the queries. The logic formalism provides a higher level notion to express queries since it is based on logical computation in query processing to finding all proofs for logic query statements. Particularly, it is easier to express co-occurrence XML element constraints and is integrated with query constructs for specifying multimedia object relationships in querying multimedia content descriptions. The path predicate approach can also directly work on XML document model rather than a specific data model of documents.

The rest of the paper is organized as follows. 2 describes MPEG-7 XML documents, specifications of multimedia objects as temporal/audio/visual datatypes, and identifies issues in MPEG-7 query language construct design, depicts the proposed query language MMDOC-QL and its embedded path predicate calculus. 3 provides more complicate example of MPEG-7 structured content query in MMDOC-QL. 4 discusses related work in multimedia document and query languages. 5 provides some conclusion remarks

MPEG-7 XML Query Specification

MPEG-7 XML Documents

The document mpeg7video.xml (2) we used for the query is a MPEG-7 XML document for describing the content of a turbine inspection video (1). This video has been processed and video objects are extracted for generating this MPEG-7 description.

Figure 1: Industrial turbine inspection video
[Link to open this graphic in a separate page]

This MPEG-7 document consists of an AudioVisualContent of type "VideoType" named "TurbineVideo". The video is segmented into scenes and the scenes are described by using the "SegmentDecomposition" tag with the decomposition type "SpatioTemporal". Each segment or scene can have several objects of interest and they are described here as well. In particular, let's take a look at the second segment which has an id "BurnerScene" and is of type "MovingRegionType". We use the "MovingRegionType" tag because there are multiple objects that move over time. The detailed descriptions are as follows.

Figure 2: MPEG-7 XML document
[Link to open this graphic in a separate page]
[Link to open this graphic in a separate page]

The video segments (scenes) can be further broken up using the same "SegmentDecomposition" tag and is again of type "SpatioTemporal". Taking a closer look, we find that the first object has an id "MR001", and it moves over time, the trajectory of which is given here. The tag "MediaTime" provides the duration of the object. The location of the object is defined temporally using the tag "ParameterTrajectory". At the first frame or instance where the object first appears, the location is given by a 4x2 matrix defining the four coordinates of the object boundary. Any number of coordinates can be used to define the boundary. The complete interval, defined using "WholeInterval" tag, consists of 300 secs. The base time unit is 1 sec (P1S). There are 25 node points which determine the "KeyPointNum". The "InterpolatedValue" tag is used to define the corresponding coordinates of the object of interest at each of these nodes. Each KeyValue gives the coordinate location for a single vertex. This is done for all four vertices that constitute the boundary in this case. Since the value of attribute "MotionModel" is 0, it means a linear model. For frames that lie within these nodes, a simple linear interpolation is used to determine the actual location on that frame. The rest of the example follows the above format to describe other objects and scenes in the video.

We have developed a tool based on the scene change technique[Chakraborty 99]to generate such a description from a video as follows. At first, the video is broken down temporally into scenes or shots using scene change detection algorithms that can detect both, abrupt as well as gradual changes. Next, the users identify objects of interest within these scenes and outlines them. These are then tracked over time in a semi-automatic way. Wherever there is a significant motion change and a linear mode is inadequate, a node point is created. To make things simpler as described in the above example, one can also divide the interval into equal segments. At these boundaries, node points are created and the object outline is described.

Specifying Multimedia Objects as Temporal, Audio, and Visual Datatypes

In [Liu1 00] [Liu2 00], authors have shown that multimedia objects can be described as spatial, temporal and visual datatypes by using abstract datatype techniques (ADT). The composite datatypes can be constructed from more primitive ones. These datatypes can be formalized as XML element datatypes within W3C XML Schema [XML Schema Part 1: Structures] framework, particularly the datatype part [XML Schema Part 2: Datatypes]. The relationships of multimedia objects are often derived from element datatypes rather than from element hierarchical relationships. The relationships can be even predefined as another complex datatypes for multimedia XML documents. A similar technique for specifying moving objects was proposed by [Erwig 99][Manolopoulos 00] in relational databases.

At 51th MPEG meeting in March 2000, MPEG committee has decided to adopt XML Schema Language as MPEG-7 Description Definition Language (DDL) for describing multimedia content. Since then, a comprehensive set of audio and visual datatypes is being developed based on XML datatype mechanisms. The main components of the MPEG-7 standard are: Descriptors (Ds) for describing audio and visual features, Description Schemes(DSs) for describing the structure and semantics of the relationships between components. The components can be either Ds or DSs. There is also a description definition language for allowing the creation of a new D or DS and for allowing extension of existing Ds or DSs.

MPEG-7 datatype hierarchy can be viewed as follows. The base level datatypes are: Mpeg7Type, basic datatypes, reference datatypes, unique identifier datatypes, and time datatypes. Mpeg7Type provides the main basic abstract type of MPEG-7 type hierarchy. From Mpeg7Type, DSType (Description Scheme Type) and DType (Descriptor Type) are derived. From DSType, SegmentType, RelationType, GraphType, VisualDSType and AudioDSType are derived. From DType, VisualDType and AudioDType are derived. From SegmentType, StillRegionType, VideoSegmentType, MovingRegionType, AudioSegmentType AudioVisualSegmentType, and SegmentDecompositionType are derived. Some of the temporal, audio and visual datatypes are described as follows.

  • MPEG-7 temporal datatypes are used to specify either real world time or time used for audiovisual media. They are all from MPEG-7 time datatypes. These time datatypes are: TimeType (for real world time) and MediaTimeType (for the time used in audio and visual media data). Each one of them consists of a time point description and a time duration description. Typical 13 temporal relationships [Allen 83], such as after, before, meets, etc., can also be defined as MPEG-7 BinaryTemporalSegmentRelationType, which is derived from MPEG-7 RelationType in the type hierarchy.
  • MPEG-7 visual datatypes are used to specify visual properties of multimedia objects such as spatial, color, texture, motion, location, etc. All visual datatypes are derived from VisualDType. The spatial datatypes are used to specify geometric data such as points, polylines or regions, etc. The composite visual datatypes can be constructed from these primitives. Examples are RegionShapeType, ConturShapeType, RegionLocatorType, etc. In our example, we use RegionLocatorType which consists of points in pairs of coords matrix datatype for describing video objects.
  • MPEG-7 audio datatypes are used to specify audio content. Examples are SoundEffectCategoryType, SilenceType, etc. All audio datatypes are derived from AudioDType

MPEG-7 temporal, audio and visual datatypes can be further composed into more complex MPEG-7 datatypes by using XML datatype definition mechanism from predefined MPEG-7 Ds, or predefined MPEG-7 DSs. The common used DSs for composing the content are: SegmentDecomposition DS, Segment DS (e.g. MovingRegion DS, StillRegion DS, etc), Graph DS and Relation DS. Each DS or D itself is a MPEG-7 datatype. For example, MPEG-7 ParameterTrajectory datatype, SpatioTemporalLocator DS and MovingRegion DS are all spatio-temporal composite datatypes, called ParameterTrajectoryType, SpatioTemporalLocatorType and MovingRegionType, respectively for specifying spatial data changing over time. These spatio-temporal datatypes are constructed from primitive temporal datatypes (e.g., MediaTime) with spatial datatypes (e.g., RegionLocatorType) or previously defined spatio-temporal datatypes. In addition to content description DSs in MPEG-7, there are many other DSs that facilitate content navigation, content organization, content management, and user interaction. MPEG-7 DSs are used to support varieties of multimedia content retrievals such as semantics-based retrievals, structured-based retrievals, model-based retrievals, and navigation/browsing (e.g., content summary).

In our video example, we use one top-level SegmentDecomposition DS consisting of many first-level Segment DSs with “MovingRegionType”. Each Segment DS is corresponding to a scene. The detailed is given in the second scene or Segment DS. This burner scene is further composed by a second-level SegmentDecomposition DS which consists of many Segement DSs corresponding to video objects. Each video object is descibed by elements “MediaTime”, “SpatioTemporalLocator” and ParameterTrajectory” as a composite spatio-temporal datatype. Since MPEG-7 content descriptions heavily depend on the XML datatypes, MPEG-7 XML content access and relationships expression require an expressive XML query language with multimedia datatypes support for media-rich XML content retrievals.

Query Specification Issues for MPEG-7 XML

MPEG-7 XML documents pose an interesting challenge for XML query language design for covering an important aspect of XML structure and datatype usage. In the following, we address three crucial query specification issues in MPEG-7 XML document retrievals.

  1. Intensional Data and Relationship Specifications. Extensional data and relationships are those data and relationships that are explicitly stored in XML documents. Intensional data and relationships are those that are computed or deducted from extensional data and relationships in XML documents. Many relationships of multimedia objects in MPEG-7 documents are derived from stored content descriptions based on element datatypes or DS schemes rather than from XML element hierarchical relationships. Thus, the capability of expressing the relationships in query language constructs is crucial for MPEG-7 query specifications. Examples of the relationships are point-inside, region-overlap, etc.
    In addition, many spatial and temporal data are represented in an implicit manner inside MPEG-7 XML documents unlike data in relational databases. For instance, an instance of MediaTime element in MPEG-7 means a time interval. It is important to express those implicit MediaTimePoints in that interval in query language since identification of multimedia objects may depend on a particular MediaTimePoint
  2. Document Addressing SpecificationsMPEG-7 XML documents often contain irregular document structures. For instance, a Segment tag which can be inside another Segment tag in MPEG-7 XML documents. MPEG-7 content structures are based on their own datatypes and description schemes (DSs) rather than on XML element hierarchy. MPEG-7 XML documents normally are not data-centered documents which are collection of almost identical structures. A full document addressing query construct is needed to precisely specify the desired document locations in recursive or contextual XML structures for retrieving information.
  3. Co-occurrence Constraints SpecificationsThe multimedia object descriptions have temporal and spatial synchronization constraints in nature. Thus MPEG-7 XML document elements normally have co-occurrence constraints, e.g. if one XML element for a multimedia object description has attribute A in certain spatial location, it must has the same attribute A in another location. Another example is: two multimedia objects appear inside the same spatial region at the same time.

Overview of MMDOC-QL

In answering to these specification issues, we have designed an experimental XML query language MMDOC-QL. This language embeds within it a logic formalism Path Predicate Calculus to specify queries. This path predicate calculus can adequately support the co-occurrence constraints and document addressing specifications for querying XML documents. To support intensional data and relationships specifications in this logical formalism, certain stereotypical logic operators are incorporated for asserting multimedia object relationships in this query language. Examples of the multimedia logic operators are, OVERLAP(element1: RegionLocatorType, element2: RegionLocatorType), TRAJECTORY(element1: MovingRegionType, element2: MediaTimePoint), etc. Another logic operator MEMBERP is also included for asserting intensional data such as MediaTImePoint in the language constructs.

In the following, we illustrate MMDOC-QL for specifying MPEG-7 XML document queries. An example of query is in the form of "finding all video object ids and show up time over a particular area".

[Link to open this graphic in a separate page]

In MMDOC-QL, there are four clauses: OPERATION clause (either GENERATE, INSERT, DELETE, or UPDATE) is used to describe the logic conclusions in the form of allowable element predicates and path predicates. In this paper, we focus on retrieval operation clause by using keyword GENERATE for MPEG-7 XML queries. GENERATE clause is similar to SELECT in SQL, but works for XML documents. PATTERN clause is used to describe the domain constraints of free logical variables including tag, attribute, content, address and datatype, by using regular expressions. FROM clause is used to describe source documents for querying. CONTEXT clause is used to describe logic assertions about document elements in allowable logic formulas in path predicate calculus. FROM and CONTEXT clauses are paired together and there could be multiple pairs for describing multiple sources. The logic variables are indicated by "%" such as "%objectid". Queries in MMDOC-QL are equivalent to finding all proofs to existential closure of logical assertions

In this example, the path formula (<Segment> WITH xsi:type=”MovingRegionType” ... <MediaTime> AT %x)))in CONTEXT clause asserts that element “Segment” with id equal to %objectid contains element “SpatioTemporalLocator” of which the video objects are located during MediaTime %x. In general, (<%t> WITH attribute1=%x1, ..., attributen=%xn AT %a CONTAINING %c) is an English-like notation for element predicate E(x1, x2, ..., xn, c, t, a) which stands for a logic assertion that element "t" at address "a" contains "c" with attributes x1, x2, ..., xn in a document tree. A path logic formula is acomposition of element predicates by XPath[XPath 99] “axis-operators” such as DIRECTLY CONTAINING, etc. Note that here, compared with context variables and functional forms in XPath, we use a logic form of XPath axis-operators with logical variables in the path formula for asserting logical truths about document elements. The domain of logical variable %objectid is restricted to be strings beginning with “MR” followed by digits. The logic variable “%t” is to used to bind the MediaTimePoint in this MediaTime interval “%x” during logic computation. TRAJECTORY operator is used to assert trajectory region from a moving region %movingregion at MediaTimePoint %t, and OVERLAP is a spatial logic operator for further asserting that the desired object region is also overlapped with the focus area.

Path Predicate Calculus

A form of logic, called Path Predicate Calculus, is defined below. It has embedded within our multimedia document query language, MMDOC-QL. Formulas in path predicate calculus are restricted forms of first-order predicate. For these logic-based queries and manipulations, we have designed two important predicates: element predicates andpath predicates, for asserting logical truth statements about document elements in a document tree. In the following, we will first describe all allowable formulas in this logic by recursively defining well-formed formulas and then show several examples of XML modification manipulations specified in this formalism.

Formulas in path predicate calculus are of the form P(x1, x2, ..., xn, c1,c2,.., cm, t1, t2, ..., tp, a1,..., aq, d1,..., dr) where x1, x2, ..., xn, c1, c2,.., cm, t1, t2, tp, a1, .., aq, d1,..., dr are free logic variables for representing element attributes, element contents, tag names, element addresses, and element datatype members respectively. An occurrence of a variable in a formula is "free" if that variable has not been introduced by a "for all" or "there exists" quantifier. Otherwise, it is a "bound" variable. Queries in this logic formalism are equivalent to finding all proofs to existential closure of P(x1, x2, ..., xn, c1,c2,.., cm, t1, t2, ..., tp, a1,...,aq, d1,...,dr), i.e., to (EX x1) (EX x2) ( ... )(EX dr)P( x1, x2, ...,xn, c1,c2,..,cm, t1, t2, ...,tp,a1,...,aq, d1,...,dr). The detailed descriptions about relationships between logic computation and specification can be seen[Gallaire 78][Brown 85]

The atomic formula is in either of the form:

  1. E(x1, x2, ..., xn, c, t, a), where E is an element predicate and each of x1, x2, ..., xn, c, t, a is a constant or variable. The predicate E(x1, x2, ..., xn, c, t, a) stands for a logic assertion that element "t" at address "a" contains "c" with attributes x1, x2, ..., xn in a document tree. An English-like notation for element predicate is(<%t> WITH attribute1=%x1, ..., attributen=%xn AT %a CONTAINING %c). For brevity, we can also use short versions with only needed variables in logic queries such as (<%t> WITH attribute1=%x1, ... attributen=%xn), (<%t> CONTAINING %c), etc., if a full version can be implied clearly in the context.
  2. mm-operator (x1, x2, x3, ..., xn), where x1, ..., xn are constants, element address variables or element datatype variables. An mm-operator(x1, x2, x3, ..., xn) asserts logic predicates about spatial, temporal, or visual relationships of document segments based on abstract datatypes in XML Schema framework.[XML 00]. The multimedia object descriptions can be specified as XML elements with spatial, temporal or visual datatypes. Based on abstract datatypes, many spatial, temporal and visual mm-operators such as area-overlap, inside, nearby, time-before, time-after, color-similarity, etc., can be defined for specifying intensional multimedia object relationships in XML documents.
  3. x op y, where op is an arithmetic comparison operator and x, y are either constants, element attribute variables, or element datatype variables.
  4. TYPEP(x tn) where x is a constant or variable and tn is a element datatype name for asserting logic truth about an element datatype.
  5. MEMBERP(d tv) where d is a constant or variable, and tv is an element address variable or an element with a datatype for asserting logic truth about member d in tv with this datatype.
    For example, MEMBERP(“2” “<LIST>1 2 3 4</LIST>”) will be true if this instance of LIST element is defined with list-of-integers datatype in a document.

All other allowable logic formulas are recursively defined from atomic ones.

  1. (Boolean formula) If P1 and P2 are well-formed formulas in path predicate calculus, then P1 AND P2, P1 OR P2, and NOT P1 are all well-formed formulas for asserting "P1 and P2 are both true", "P1 or P2 or both are true" and "P1 is not true" respectively.
  2. (Path Predicate) If both P1 and P2 are well-formed formulas havng at least one element predicate then (P1 "axis-op" P2) is also a well-formed formula for asserting logic truths P1 with path constraint P2 about document elements in a document tree. The "axis-op" is one of W3C XPath axis operators. Examples are (a) parent/child relationship operators such as: INSIDE, DIRECTLY INSIDE, CONTAINING, DIRECTLY CONTAINING, etc. and (b) the sibling relationship operators such as: BEFORE, IMMEDIATELY BEFORE, AFTER, IMMEDIATELY AFTER, SIBLING, IMMEDIATELY SIBLING, etc. Note that here we illustrate a logic version of axis concepts defined in XPath since path formula in Path Predicate Calculus are logical statements for asserting logical truths. An example of the path predicate is: (<bibref> INSIDE (<paper> CONTAINING (<fname> CONTAINING "Peiya") AND (<surname> CONTAINING "Liu">))))for specifying all bibref elements inside Peiya Liu's paper.
  3. (Quantified formula): If P is a formula, then (EX x)(P) is also a formula. The symbol EX is a quantifier read "there exists". The occurrences of x that is free in P are bound to (EX x)(P). The formula (EX x)(P) asserts that there exists a value of x such that when we substitute this value for all free occurrences of x in P, the formula P becomes true. The only other quantifier is ALL can be defined in a similar way. If P is a formula, then (ALL x)(P) is also a formula. The symbol ALL is a quantifier read " for all". The occurrences of x that are free in P are bound to (ALL x)(P). The formula (ALL x)(P) asserts that all possible values of x such that when we substitute any such a value for all free occurrences of x in P, the formula P becomes true.

Note that domains of variables in P are finite in this path predicate calculus since in a particular document instance for being queried, there are finite numbers of element attributes, element contents, tag names, element datatypes and element addresses. This "safe" property is required to avoid finding all proofs of query formula over infinite domains. In a real query language design, we can further restrict variables by using regular expressions for allowable variable patterns shown previously.

MPEG-7 Structured Content Query

MPEG-7 XML documents can organize multimedia content in more structured manner to support better visual information retrievals[Del Bimbo 99] beyond feature-based content retrievals. To benefit this, XML query language constructs need to have very expressive power about document structure and addressing specifications. In the following example, a more complex MPEG-7 structured content query is given to illustrate document addressing specifications in this logic formalism. In this query, we add more constraints in CONTEXT clause in the form of ”find out only those objects in the focus area, but shown up in a scene which appears either immediately before or after Burner scene”. This query requires an expressive power for specifying the contexts of objects by a path formula about addressing constraints about parent/ancestor/child and sibling relationships among document elements in this recursive video segment structure.

[Link to open this graphic in a separate page]

Related Work

Two kinds of related work are described here. One is related to XML or SGML multimedia documents and the other is related to multimedia query languages.

ISO HyTime [HyTime 97] based on SGML uses Finite Coordinate Space (FCS) to define scheduled structures and events. These event schedules are intentionally designed for HyTime document presentation. FCS defines an abstract and system-independent method of specifying spatial and temporal information separated from content to be presented as event schedules in a multidimensional coordinate space. The design motivation is based on presentation abstraction rather than information retrieval. The indexing scheme support in HyTime is limited to querying spatial/temporal media objects and structures.

W3C SMIL [SMIL 98] is based on XML to define spatial and temporal layouts for SMIL document playout. The layout information is related to media display windows on a screen and media playing time. Thus, the spatial and temporal structures provided in SMIL are also for presentation purpose rather than for storage representation to be accessed. Futhermore, there are structural differences in representation [Rutledge 98][Liu 99]. Often, the presentation forms are not sufficient for storage representation. Spatial and temporal content descriptions are often less emphasized in presentation-oriented multimedia specifications.

SQL/MM and SQL3/Temporal [SQL Standardization Projects] are new ISO standardization projects for extending database query language capability to specify and manage multimedia objects and temporal information in the relational data model. Both are focusing on integration of time- or space- dependent multimedia objects into relational data models for query. However, multimedia document models impose requirements on querying, which are quite different from this relational table model since not only document content but also document structures must be available for retrieval. These proposed query specifications based on relational data models would limit the retrieval capability for document models.

Concluding Remarks

The emerging MPEG-7 standard uses XML Schema as a multimedia content description language. Many proposed XML document query languages are available, but there is still a lack of adequate query constructs and formalisms for specifying different aspects of XML documents, particularly related to spatial, temporal and visual datatypes as in MPEG-7 documents. In the paper, we have identified certain critical specification issues for XML query language design consideration to support different aspects of XML documents. We use MPEG-7 documents to illustrate the issues: intensional data and relationships due to XML datatype mechanisms, irregular XML structures, and co-occurrence constraints.

The main contributions of this paper are (1) to identify the critical specification issues in XML query language for XML document retrieval. We illustrate issues by using MPEG-7 XML documents, and (2) to propose solutions by using a logic formalism, called path predicate calculus for supporting queries about XML documents with intensional data and relationships, irregular document structures, and co-occurrence constraints. There are still query constructs not included here. However, the paper intends to cover the essential features and to show the flavors of document predicates in a logic formalism and the importance for specifying XML document retrievals. We feel that this direction of research is important for XML query language design, development and standardization. MPEG-7 XML documents reveal the weakness of current XML query language proposals.


[Allen 83] J. F. Allen Maintaining Knowledge about Temporal Intervals. Comm. ACM 26(11), 1983

[Brown 85] F. Brown and P. Liu, A Logic Programming and Verification System for Recursive Quantificational Logic, Proceedings of the Nth International Joint Conference on Artificial Intelligence(IJCAI-85), 1985 Los Angeles, CA

[Chakraborty 99] A. Chakraborty, P. Liu and L. Hsu, Authoring and Videwing Video Documents using SGML structure, 1999 IEEE International Conference on Multimedia Computing and Systems, pp 654-660 Florence, Italy

[Codd 72] E. F. Codd "Relational completeness of data base sublanguages", in Data base Systems[R. Rustin, ed) Prentice-Hall, Englewood Cliff, New Jersey. 1972

[Del Bimbo 99] A. Del Bimbo, Visual Information Retrieval, Published by Morgan Kaufsmann, 1999

[Erwig 99] M. Erwig, R. H. Guting, M. Schneider and M. Vazirgiannis, Spatio-Temporal DataTypes: Approach to Modeling and Querying Moving Objects in Databases, GeoInformatica Vol 3, No 3, 1999

[Gallaire 78] H. Gallaire, J. Minker and J. M. Nicolas."An Overview and Introduction to Logic and Database", in Logic and Database, (H. Gallaire and J. Minker ed), 1978

[HyTime 97] ISO/IEC 10744:1997 Hypermedia/Time-based Structuring Language (HyTime), Second Edition

[Liu 99] P. Liu, Y. F. Day, L. H. Hsu, Automatic Generation of DSSSL Specifications for Transforming SGML Documents into Card-Based Presentations, GCA Markup Technologies 99, PA, USA

[Liu1 00] P. Liu, L. H. Hsu, Spatial and Temporal Datatypes: An Approach to Specifying and Querying Multimedia Objects and Scheduled Structures in XML Documents, XML Europe 2000, Paris, 2000

[Liu2 00] P. Liu, A. Chakraborty, L. H. Hsu, Path Predicate Calculus: Towards a Logic Formalism for Multimedia XML Query Language, Extreme Markup Languages 2000, Montral, Canada

[Lorel 00] S. Abiteboul, P.Buneman, and D. Suciu, Data on the Web, Published by Morgan Kaufsmann, 2000

[Manolopoulos 00] Y. Manolopoulos, Y Theodoridis and V. J. Tsotras, Advanced Database Indexing, Kluwer Academic Publishers, 2000

[MPEG Web Site] http://www.cselt.it/mpeg/standards.htm

[Piottee 78] A Piottee."High Level Data Base Query Language", in Logic and Database, (H. Gallaire and J. Minker ed), 1978

[Rutledge 98] L. Rutledge, L. Hardman, J. van Ossenbruggen and D. C. A. Bulterman, Structural Distinctions Between Hypermedia Storage and Presentation, in Proc. ACM Multimedia 98, September 1998, pp.145-150

[SDQL 96] ISO 10179:1996 Information Technology -Processing Languages - Document Style Semantics and Specification Language (DSSSL)

[SMIL 98] Synchronized Multimedia Integration Language (SMIL) 1.0 Specification, W3C Recommendations 15 June 1998

[SQL Standardization Projects] http://www.jcc.com/SQLPages/jccs_sql.htm (SQL Standard Reference Page)

[Ullman 88] J. Ullman."Principles of Database and Knowledge-Base Systems", Volume I, Computer Science Press, 1988

[XML 00] Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendations 6 October 2000

[XML Query Algebra] XML Query Algebra: W3C Working Draft 15 February 2001

[XML Schema Part 1: Structures] XML Schema Part 1: Structures, W3C Proposed Recommendation 16 March 2001

[XML Schema Part 2: Datatypes] XML Schema Part 2: Datatypes: W3C Proposed Recommendation 16 March 2001

[XML-QL 99] A Deutsch, M. Fermandez, D. Florescu, A. Levy and D. Suciu: A Query Lanuage For XML, WWW'99

[XPath 99] XML Path Language (XPath) Version 1.0, W3C Recommendations 16 November 1999

[XQL 98 ] J. Robie abd J. Lapp, XML Query Language, QL'98, http://www.w3c.org/TandS/QL/QL98/

[XQuery] XQuery: A Query Language for XML: W3C Working Draft 15 February 2001

[YATL 98] Your Mediators Need Data Conversion, ACM-SIGMOD 1998

A Logic Approach for MPEG-7 XML Document Queries

Peiya Liu [Senior Member of Technical Staff, Siemens Corporate Research, Inc., Multimedia Documentation Program]
Amit Chakraborty [Member of Technical Staff, Siemens Corporate Research, Inc., Multimedia Documentation Program]
Liang H. Hsu [Manager and Distinguished Member of Technical Staff, Siemens Corporate Research, Inc., Multimedia Documentation Program]