Many query languages are currently being proposed for specifying XML document retrievals. The expressive power and usefulness of these query languages is really based on their embedded formalisms and intended XML document applications. The emerging MPEG-7 multimedia standard uses XML Schema:Datatypes for multimedia content descriptions and has posed an interesting challenge to XML query language design for XML document retrievals. Most XML query language proposals have limitations in specifying queries for this type of XML documents. In this paper, we have identified some critical specification issues in MPEG-7 XML queries and propose an XML query language, MMDOC-QL with multimedia query constructs. MMDOC-QL is based on a logic formalism, called path predicate calculus. In this path predicate calculus, the atomic logic formulas are element predicates rather than relation predicates in relational calculus. In this path calculus query language, queries in this calculus are equivalent to finding all proofs to existential closure of logical assertions in the form of path predicates that the tree document elements must satisfy. Spatial, temporal and visual datatypes and relationships can also be described in this formalism for content retrieval.
MPEG-7 is an emergent ISO/IEC standard and formally named as “Multimedia Content Description Interface”. Unlike the previous MPEG [MPEG Web Site] compression standards MPEG-1, MPEG-2 and MPEG-4, MPEG-7 aims to create a standard for describing the multimedia content to enable the integration of production, distribution and content access paradigm. This MPEG-7 standard uses XML Schema to describe multimedia objects such as video, audio images, etc. as spatial, temporal or visual XML datatypes. This type of multimedia XML documents may include descriptions about both static/spatial media (such as text, graphics, drawings, images, etc.) and time-based media (such as video, audio, animation, etc.). The content can be further organized into three major document structures: hierarchical, hyperlinked, and temporal/spatial structures. MPEG-7 poses many interesting challenges in designing XML query languages to cover different aspects of XML documents.
Many document query languages such as [SDQL 96], [XML-QL 99], [Lorel 00], [YATL 98], [XQL 98 ], recent W3C [XQuery], etc., have been proposed for document retrievals. However, these languages cannot adequately support MPEG-7 XML document queries due to limited expressive power about XML datatypes for specifying “intensional” data and relationships inside XML documents. This has limited the usage of query languages in XML document retrievals. An ideal XML query language should support different aspects of XML structures and datatypes.
In this paper, we have identified several critical issues in MPEG-7 XML query specifications. Namely, they are “intensional data and relationships specifications”, “document addressing specifications” and “co-occurence constraints specifications”. We tackle these issues by using a logic formalism, called Path Predicate Calculus [Liu2 00] with multimedia query constructs in our XML query language, MMDOC-QL, for specifying spatial and temporal relationships to support MPEG-7 XML document retrieval and modification. We intend to open up a new direction of research in establishing logic formalisms and XML datatype specifications in query language design.
There are several advantages to this approach. First, these critical issues are tackled within the same logic framework. In the past, two formalisms have often been used for describing query languages in relational models [Ullman 88] (1) algebraic formalism, called relational algebra, and (2) logic formalism, called relational calculus, including tuple relational calculus [Codd 72]and domain relational calculus [Piottee 78]. However, due to underlying data models being different from the document model, these formalisms for relational query languages could not be directly used as formalisms for XML query languages. Queries in this formalism are equivalent to finding all proofs to existential closure of logical assertions that document elements must satisfy. In Path Predicate Calculus, the atomic logic formulae are element predicates for asserting logic statements about document elements in a document tree. This paper will show that many spatial/temporal/visual operations can be expressed in such a logic formalism. The relational calculus is a special case of this logic form when applying to "flat" data-oriented documents and element predicates are degenerated into relational predicates as in relational models. Second, it provides "non-proceduribility" of document queries. Historically, calculus-based relational query languages are more prevalent than algebraic languages due to declarative characteristics of logic formalism. The algebraic approach, taken by W3C Query Working Group [XQuery] [XML Query Algebra], often needs to explicitly describe the order of operations on underlying data models to express the queries. The logic formalism provides a higher level notion to express queries since it is based on logical computation in query processing to finding all proofs for logic query statements. Particularly, it is easier to express co-occurrence XML element constraints and is integrated with query constructs for specifying multimedia object relationships in querying multimedia content descriptions. The path predicate approach can also directly work on XML document model rather than a specific data model of documents.
The rest of the paper is organized as follows. 2 describes MPEG-7 XML documents, specifications of multimedia objects as temporal/audio/visual datatypes, and identifies issues in MPEG-7 query language construct design, depicts the proposed query language MMDOC-QL and its embedded path predicate calculus. 3 provides more complicate example of MPEG-7 structured content query in MMDOC-QL. 4 discusses related work in multimedia document and query languages. 5 provides some conclusion remarks
The document mpeg7video.xml (2) we used for the query is a MPEG-7 XML document for describing the content of a turbine inspection video (1). This video has been processed and video objects are extracted for generating this MPEG-7 description.
This MPEG-7 document consists of an AudioVisualContent of type "VideoType" named "TurbineVideo". The video is segmented into scenes and the scenes are described by using the "SegmentDecomposition" tag with the decomposition type "SpatioTemporal". Each segment or scene can have several objects of interest and they are described here as well. In particular, let's take a look at the second segment which has an id "BurnerScene" and is of type "MovingRegionType". We use the "MovingRegionType" tag because there are multiple objects that move over time. The detailed descriptions are as follows.
The video segments (scenes) can be further broken up using the same "SegmentDecomposition" tag and is again of type "SpatioTemporal". Taking a closer look, we find that the first object has an id "MR001", and it moves over time, the trajectory of which is given here. The tag "MediaTime" provides the duration of the object. The location of the object is defined temporally using the tag "ParameterTrajectory". At the first frame or instance where the object first appears, the location is given by a 4x2 matrix defining the four coordinates of the object boundary. Any number of coordinates can be used to define the boundary. The complete interval, defined using "WholeInterval" tag, consists of 300 secs. The base time unit is 1 sec (P1S). There are 25 node points which determine the "KeyPointNum". The "InterpolatedValue" tag is used to define the corresponding coordinates of the object of interest at each of these nodes. Each KeyValue gives the coordinate location for a single vertex. This is done for all four vertices that constitute the boundary in this case. Since the value of attribute "MotionModel" is 0, it means a linear model. For frames that lie within these nodes, a simple linear interpolation is used to determine the actual location on that frame. The rest of the example follows the above format to describe other objects and scenes in the video.
We have developed a tool based on the scene change technique[Chakraborty 99]to generate such a description from a video as follows. At first, the video is broken down temporally into scenes or shots using scene change detection algorithms that can detect both, abrupt as well as gradual changes. Next, the users identify objects of interest within these scenes and outlines them. These are then tracked over time in a semi-automatic way. Wherever there is a significant motion change and a linear mode is inadequate, a node point is created. To make things simpler as described in the above example, one can also divide the interval into equal segments. At these boundaries, node points are created and the object outline is described.
In [Liu1 00] [Liu2 00], authors have shown that multimedia objects can be described as spatial, temporal and visual datatypes by using abstract datatype techniques (ADT). The composite datatypes can be constructed from more primitive ones. These datatypes can be formalized as XML element datatypes within W3C XML Schema [XML Schema Part 1: Structures] framework, particularly the datatype part [XML Schema Part 2: Datatypes]. The relationships of multimedia objects are often derived from element datatypes rather than from element hierarchical relationships. The relationships can be even predefined as another complex datatypes for multimedia XML documents. A similar technique for specifying moving objects was proposed by [Erwig 99][Manolopoulos 00] in relational databases.
At 51th MPEG meeting in March 2000, MPEG committee has decided to adopt XML Schema Language as MPEG-7 Description Definition Language (DDL) for describing multimedia content. Since then, a comprehensive set of audio and visual datatypes is being developed based on XML datatype mechanisms. The main components of the MPEG-7 standard are: Descriptors (Ds) for describing audio and visual features, Description Schemes(DSs) for describing the structure and semantics of the relationships between components. The components can be either Ds or DSs. There is also a description definition language for allowing the creation of a new D or DS and for allowing extension of existing Ds or DSs.
MPEG-7 datatype hierarchy can be viewed as follows. The base level datatypes are: Mpeg7Type, basic datatypes, reference datatypes, unique identifier datatypes, and time datatypes. Mpeg7Type provides the main basic abstract type of MPEG-7 type hierarchy. From Mpeg7Type, DSType (Description Scheme Type) and DType (Descriptor Type) are derived. From DSType, SegmentType, RelationType, GraphType, VisualDSType and AudioDSType are derived. From DType, VisualDType and AudioDType are derived. From SegmentType, StillRegionType, VideoSegmentType, MovingRegionType, AudioSegmentType AudioVisualSegmentType, and SegmentDecompositionType are derived. Some of the temporal, audio and visual datatypes are described as follows.
MPEG-7 temporal, audio and visual datatypes can be further composed into more complex MPEG-7 datatypes by using XML datatype definition mechanism from predefined MPEG-7 Ds, or predefined MPEG-7 DSs. The common used DSs for composing the content are: SegmentDecomposition DS, Segment DS (e.g. MovingRegion DS, StillRegion DS, etc), Graph DS and Relation DS. Each DS or D itself is a MPEG-7 datatype. For example, MPEG-7 ParameterTrajectory datatype, SpatioTemporalLocator DS and MovingRegion DS are all spatio-temporal composite datatypes, called ParameterTrajectoryType, SpatioTemporalLocatorType and MovingRegionType, respectively for specifying spatial data changing over time. These spatio-temporal datatypes are constructed from primitive temporal datatypes (e.g., MediaTime) with spatial datatypes (e.g., RegionLocatorType) or previously defined spatio-temporal datatypes. In addition to content description DSs in MPEG-7, there are many other DSs that facilitate content navigation, content organization, content management, and user interaction. MPEG-7 DSs are used to support varieties of multimedia content retrievals such as semantics-based retrievals, structured-based retrievals, model-based retrievals, and navigation/browsing (e.g., content summary).
In our video example, we use one top-level SegmentDecomposition DS consisting of many first-level Segment DSs with “MovingRegionType”. Each Segment DS is corresponding to a scene. The detailed is given in the second scene or Segment DS. This burner scene is further composed by a second-level SegmentDecomposition DS which consists of many Segement DSs corresponding to video objects. Each video object is descibed by elements “MediaTime”, “SpatioTemporalLocator” and ParameterTrajectory” as a composite spatio-temporal datatype. Since MPEG-7 content descriptions heavily depend on the XML datatypes, MPEG-7 XML content access and relationships expression require an expressive XML query language with multimedia datatypes support for media-rich XML content retrievals.
MPEG-7 XML documents pose an interesting challenge for XML query language design for covering an important aspect of XML structure and datatype usage. In the following, we address three crucial query specification issues in MPEG-7 XML document retrievals.
In answering to these specification issues, we have designed an experimental XML query language MMDOC-QL. This language embeds within it a logic formalism Path Predicate Calculus to specify queries. This path predicate calculus can adequately support the co-occurrence constraints and document addressing specifications for querying XML documents. To support intensional data and relationships specifications in this logical formalism, certain stereotypical logic operators are incorporated for asserting multimedia object relationships in this query language. Examples of the multimedia logic operators are, OVERLAP(element1: RegionLocatorType, element2: RegionLocatorType), TRAJECTORY(element1: MovingRegionType, element2: MediaTimePoint), etc. Another logic operator MEMBERP is also included for asserting intensional data such as MediaTImePoint in the language constructs.
In the following, we illustrate MMDOC-QL for specifying MPEG-7 XML document queries. An example of query is in the form of "finding all video object ids and show up time over a particular area".
In MMDOC-QL, there are four clauses: OPERATION clause (either GENERATE, INSERT, DELETE, or UPDATE) is used to describe the logic conclusions in the form of allowable element predicates and path predicates. In this paper, we focus on retrieval operation clause by using keyword GENERATE for MPEG-7 XML queries. GENERATE clause is similar to SELECT in SQL, but works for XML documents. PATTERN clause is used to describe the domain constraints of free logical variables including tag, attribute, content, address and datatype, by using regular expressions. FROM clause is used to describe source documents for querying. CONTEXT clause is used to describe logic assertions about document elements in allowable logic formulas in path predicate calculus. FROM and CONTEXT clauses are paired together and there could be multiple pairs for describing multiple sources. The logic variables are indicated by "%" such as "%objectid". Queries in MMDOC-QL are equivalent to finding all proofs to existential closure of logical assertions
In this example, the path formula (<Segment> WITH xsi:type=”MovingRegionType” ... <MediaTime> AT %x)))in CONTEXT clause asserts that element “Segment” with id equal to %objectid contains element “SpatioTemporalLocator” of which the video objects are located during MediaTime %x. In general, (<%t> WITH attribute1=%x1, ..., attributen=%xn AT %a CONTAINING %c) is an English-like notation for element predicate E(x1, x2, ..., xn, c, t, a) which stands for a logic assertion that element "t" at address "a" contains "c" with attributes x1, x2, ..., xn in a document tree. A path logic formula is acomposition of element predicates by XPath[XPath 99] “axis-operators” such as DIRECTLY CONTAINING, etc. Note that here, compared with context variables and functional forms in XPath, we use a logic form of XPath axis-operators with logical variables in the path formula for asserting logical truths about document elements. The domain of logical variable %objectid is restricted to be strings beginning with “MR” followed by digits. The logic variable “%t” is to used to bind the MediaTimePoint in this MediaTime interval “%x” during logic computation. TRAJECTORY operator is used to assert trajectory region from a moving region %movingregion at MediaTimePoint %t, and OVERLAP is a spatial logic operator for further asserting that the desired object region is also overlapped with the focus area.
A form of logic, called Path Predicate Calculus, is defined below. It has embedded within our multimedia document query language, MMDOC-QL. Formulas in path predicate calculus are restricted forms of first-order predicate. For these logic-based queries and manipulations, we have designed two important predicates: element predicates andpath predicates, for asserting logical truth statements about document elements in a document tree. In the following, we will first describe all allowable formulas in this logic by recursively defining well-formed formulas and then show several examples of XML modification manipulations specified in this formalism.
Formulas in path predicate calculus are of the form P(x1, x2, ..., xn, c1,c2,.., cm, t1, t2, ..., tp, a1,..., aq, d1,..., dr) where x1, x2, ..., xn, c1, c2,.., cm, t1, t2, tp, a1, .., aq, d1,..., dr are free logic variables for representing element attributes, element contents, tag names, element addresses, and element datatype members respectively. An occurrence of a variable in a formula is "free" if that variable has not been introduced by a "for all" or "there exists" quantifier. Otherwise, it is a "bound" variable. Queries in this logic formalism are equivalent to finding all proofs to existential closure of P(x1, x2, ..., xn, c1,c2,.., cm, t1, t2, ..., tp, a1,...,aq, d1,...,dr), i.e., to (EX x1) (EX x2) ( ... )(EX dr)P( x1, x2, ...,xn, c1,c2,..,cm, t1, t2, ...,tp,a1,...,aq, d1,...,dr). The detailed descriptions about relationships between logic computation and specification can be seen[Gallaire 78][Brown 85]
The atomic formula is in either of the form:
All other allowable logic formulas are recursively defined from atomic ones.
Note that domains of variables in P are finite in this path predicate calculus since in a particular document instance for being queried, there are finite numbers of element attributes, element contents, tag names, element datatypes and element addresses. This "safe" property is required to avoid finding all proofs of query formula over infinite domains. In a real query language design, we can further restrict variables by using regular expressions for allowable variable patterns shown previously.
MPEG-7 XML documents can organize multimedia content in more structured manner to support better visual information retrievals[Del Bimbo 99] beyond feature-based content retrievals. To benefit this, XML query language constructs need to have very expressive power about document structure and addressing specifications. In the following example, a more complex MPEG-7 structured content query is given to illustrate document addressing specifications in this logic formalism. In this query, we add more constraints in CONTEXT clause in the form of ”find out only those objects in the focus area, but shown up in a scene which appears either immediately before or after Burner scene”. This query requires an expressive power for specifying the contexts of objects by a path formula about addressing constraints about parent/ancestor/child and sibling relationships among document elements in this recursive video segment structure.
Two kinds of related work are described here. One is related to XML or SGML multimedia documents and the other is related to multimedia query languages.
ISO HyTime [HyTime 97] based on SGML uses Finite Coordinate Space (FCS) to define scheduled structures and events. These event schedules are intentionally designed for HyTime document presentation. FCS defines an abstract and system-independent method of specifying spatial and temporal information separated from content to be presented as event schedules in a multidimensional coordinate space. The design motivation is based on presentation abstraction rather than information retrieval. The indexing scheme support in HyTime is limited to querying spatial/temporal media objects and structures.
W3C SMIL [SMIL 98] is based on XML to define spatial and temporal layouts for SMIL document playout. The layout information is related to media display windows on a screen and media playing time. Thus, the spatial and temporal structures provided in SMIL are also for presentation purpose rather than for storage representation to be accessed. Futhermore, there are structural differences in representation [Rutledge 98][Liu 99]. Often, the presentation forms are not sufficient for storage representation. Spatial and temporal content descriptions are often less emphasized in presentation-oriented multimedia specifications.
SQL/MM and SQL3/Temporal [SQL Standardization Projects] are new ISO standardization projects for extending database query language capability to specify and manage multimedia objects and temporal information in the relational data model. Both are focusing on integration of time- or space- dependent multimedia objects into relational data models for query. However, multimedia document models impose requirements on querying, which are quite different from this relational table model since not only document content but also document structures must be available for retrieval. These proposed query specifications based on relational data models would limit the retrieval capability for document models.
The emerging MPEG-7 standard uses XML Schema as a multimedia content description language. Many proposed XML document query languages are available, but there is still a lack of adequate query constructs and formalisms for specifying different aspects of XML documents, particularly related to spatial, temporal and visual datatypes as in MPEG-7 documents. In the paper, we have identified certain critical specification issues for XML query language design consideration to support different aspects of XML documents. We use MPEG-7 documents to illustrate the issues: intensional data and relationships due to XML datatype mechanisms, irregular XML structures, and co-occurrence constraints.
The main contributions of this paper are (1) to identify the critical specification issues in XML query language for XML document retrieval. We illustrate issues by using MPEG-7 XML documents, and (2) to propose solutions by using a logic formalism, called path predicate calculus for supporting queries about XML documents with intensional data and relationships, irregular document structures, and co-occurrence constraints. There are still query constructs not included here. However, the paper intends to cover the essential features and to show the flavors of document predicates in a logic formalism and the importance for specifying XML document retrievals. We feel that this direction of research is important for XML query language design, development and standardization. MPEG-7 XML documents reveal the weakness of current XML query language proposals.
[Allen 83] J. F. Allen Maintaining Knowledge about Temporal Intervals. Comm. ACM 26(11), 1983
[Brown 85] F. Brown and P. Liu, A Logic Programming and Verification System for Recursive Quantificational Logic, Proceedings of the Nth International Joint Conference on Artificial Intelligence(IJCAI-85), 1985 Los Angeles, CA
[Chakraborty 99] A. Chakraborty, P. Liu and L. Hsu, Authoring and Videwing Video Documents using SGML structure, 1999 IEEE International Conference on Multimedia Computing and Systems, pp 654-660 Florence, Italy
[Codd 72] E. F. Codd "Relational completeness of data base sublanguages", in Data base Systems[R. Rustin, ed) Prentice-Hall, Englewood Cliff, New Jersey. 1972
[Del Bimbo 99] A. Del Bimbo, Visual Information Retrieval, Published by Morgan Kaufsmann, 1999
[Erwig 99] M. Erwig, R. H. Guting, M. Schneider and M. Vazirgiannis, Spatio-Temporal DataTypes: Approach to Modeling and Querying Moving Objects in Databases, GeoInformatica Vol 3, No 3, 1999
[Gallaire 78] H. Gallaire, J. Minker and J. M. Nicolas."An Overview and Introduction to Logic and Database", in Logic and Database, (H. Gallaire and J. Minker ed), 1978
[HyTime 97] ISO/IEC 10744:1997 Hypermedia/Time-based Structuring Language (HyTime), Second Edition
[Liu 99] P. Liu, Y. F. Day, L. H. Hsu, Automatic Generation of DSSSL Specifications for Transforming SGML Documents into Card-Based Presentations, GCA Markup Technologies 99, PA, USA
[Liu1 00] P. Liu, L. H. Hsu, Spatial and Temporal Datatypes: An Approach to Specifying and Querying Multimedia Objects and Scheduled Structures in XML Documents, XML Europe 2000, Paris, 2000
[Liu2 00] P. Liu, A. Chakraborty, L. H. Hsu, Path Predicate Calculus: Towards a Logic Formalism for Multimedia XML Query Language, Extreme Markup Languages 2000, Montral, Canada
[Lorel 00] S. Abiteboul, P.Buneman, and D. Suciu, Data on the Web, Published by Morgan Kaufsmann, 2000
[Manolopoulos 00] Y. Manolopoulos, Y Theodoridis and V. J. Tsotras, Advanced Database Indexing, Kluwer Academic Publishers, 2000
[Piottee 78] A Piottee."High Level Data Base Query Language", in Logic and Database, (H. Gallaire and J. Minker ed), 1978
[Rutledge 98] L. Rutledge, L. Hardman, J. van Ossenbruggen and D. C. A. Bulterman, Structural Distinctions Between Hypermedia Storage and Presentation, in Proc. ACM Multimedia 98, September 1998, pp.145-150
[SDQL 96] ISO 10179:1996 Information Technology -Processing Languages - Document Style Semantics and Specification Language (DSSSL)
[SMIL 98] Synchronized Multimedia Integration Language (SMIL) 1.0 Specification, W3C Recommendations 15 June 1998
[Ullman 88] J. Ullman."Principles of Database and Knowledge-Base Systems", Volume I, Computer Science Press, 1988
[XML 00] Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendations 6 October 2000
[XML Query Algebra] XML Query Algebra: W3C Working Draft 15 February 2001
[XML Schema Part 1: Structures] XML Schema Part 1: Structures, W3C Proposed Recommendation 16 March 2001
[XML Schema Part 2: Datatypes] XML Schema Part 2: Datatypes: W3C Proposed Recommendation 16 March 2001
[XML-QL 99] A Deutsch, M. Fermandez, D. Florescu, A. Levy and D. Suciu: A Query Lanuage For XML, WWW'99
[XPath 99] XML Path Language (XPath) Version 1.0, W3C Recommendations 16 November 1999
[XQuery] XQuery: A Query Language for XML: W3C Working Draft 15 February 2001
[YATL 98] Your Mediators Need Data Conversion, ACM-SIGMOD 1998