Using Multi-Encryption to Provide Secure and Controlled Access to XML Documents

Tomasz Müldner
tomasz.muldner@acadiau.ca
Gregory Leighton
gleighto@cpsc.ucalgary.ca
Jan Krzysztof Miziołek
jkm@obta.uw.edu.pl

Abstract

Sharing XML documents within decentralized and distributed computing environments requires mechanisms to facilitate controlled and secure access to these documents. This includes the ability to make selective (parts of) documents available to users in multiple, possibly overlapping roles. One common solution involves the use of multiple encryption keys to enforce a defined access control policy over a document. In this paper, we describe the use of a technique called multi-encryption to provide secure and controlled access to XML documents in accordance with a specified protection policy. First, we define a key encryption function, which allocates the minimum number of keys needed to encrypt various parts of an XML document. An efficient algorithm to generate the key encryption function is described and used to encrypt every node with a single key, without adding any nodes to the original document. Since encrypting each node with a separate key is inefficient and reveals the structure of the document, we describe an algorithm which, for a given permission policy, finds the largest sub-trees that can be collapsed into a single encrypted node. Permissions are represented by meta-information which is visible only to authorized users. Users in numerous roles can securely access a complete, multi-encrypted document.

Keywords: Encryption; Interoperability

Tomasz Müldner

Tomasz Müldner is a professor of computer science at Acadia University in Nova Scotia, one of Canada's top undergraduate universities. He has received numerous teaching awards, including the prestigious Acadia University Alumni Excellence in Teaching Award in 1996. He is the author of over seventy papers and four books, including "C++ Programming with Design Patterns Revealed" and "C for Java Programmers". Dr. Müldner received his Ph.D. in mathematics from the Polish Academy of Science in Warsaw, Poland in 1975. His current research includes XML compression and encryption, P2P systems and algorithm explanation.

Gregory Leighton

Gregory Leighton is a Ph.D. student at the University of Calgary. His current research interests include XML data management and the Semantic Web.

Jan Krzysztof Miziołek

Jan Krzysztof Miziołek is a director of the Computing Services Centre for Studies on the Classical Tradition in Poland and East-Central Europe, Warsaw University, Warsaw. Dr. Miziołek received his Ph.D. in mathematics from Technical University of Lodz, Poland in 1981. He worked on design and implementation of a high-level programming language, LOGLAN-82. His current research includes XML compression and encryption.

Using Multi-Encryption to Provide Secure and Controlled Access to XML Documents

Tomasz Müldner [Jodrey School of Computer Science, Acadia University, Wolfville, Canada]
Gregory Leighton [Department of Computer Science, University of Calgary, Calgary, Canada]
Jan Krzysztof Miziołek [Centre for Studies on the Classical Tradition in Poland and East-Central Europe, Warsaw University, Warsaw, Poland ]

Extreme Markup Languages 2006® (Montréal, Québec)

Copyright © 2006 Tomasz Müldner, Gregory Leighton, and Jan Krzysztof Miziołek. Reproduced with permission.

Introduction

There is growing interest in providing controlled and secure access to data [Bertino 2004], [Devanbu 2001], [Bertino 2002]. In this context, controlled access allows the owner of data to specify permission policies indicating which users can access specific documents, or parts thereof. Secure access to data means that data is confidential, i.e. visible only to authorized parties, and data can be authenticated, i.e. the authorship of data can be verified. In addition, the integrity of data should be verifiable, to ensure that it has not been tampered with.

Since XML data has become a de facto standard for many applications, in particular for Web applications, much of the research done on controlled access in recent years deals with data in this format. Initial work on XML access control focused on centralized systems, and used cryptographic tools such as asymmetric keys and digital signatures. With the recent interest in distributed systems (e.g. peer-to-peer (P2P) overlay networks, mobile ad hoc networks (MANETs), and Web-based third-party architectures), the focus of the research has shifted to such systems (e.g. [Carminati 2005a]). In addition to proposed standards for encrypting portions of an XML document, such as the W3C XML Encryption Syntax and Processing recommendation [XML Encryption], approaches have been proposed in the research community that illustrate how multiple users holding varying permissions over an XML document can gain controlled access to the same version of that document, through the use of asymmetric encryption key pairs (e.g. [Miklau 2003]).

In this paper, we introduce techniques that can be used to implement controlled and secure access to XML documents or parts thereof. The owner of the document specifies permission policies which identify subjects, parts of the document visible to the subject, and a specific type of access, such as read or write. Subjects are defined using roles [Ferraiolo], and a permission policy associates roles with one or more views (parts of the document). A user in role R can access only those views over the document which are associated with role R.

Our work focuses on an efficient way of assigning keys used for efficient encryption, without making any modification to the structure of the original XML document. To accomplish the first task, we designed an optimal key assignment algorithm, which assigns the smallest possible number of keys that are required to provide controlled access to parts of the document. To accomplish the second task we designed an efficient algorithm to find the largest possible parts (sub-trees) of the document that can be encrypted with a single key.

The owner (creator) of the original XML document defines the protection policy, which specifies roles and associated views (parts of the document). This information is used to create a multi-encrypted document, encrypted with keys generated as described above, and then augmented with the meta-information used by subjects to gain access to available views. Users of a multi-encrypted document who are in role R can access only views associated with R in the policy. Since meta-information is also encrypted, users in role R cannot inspect permissions granted to users in other roles.

Contributions. Our approach mainly falls into the "secure publishing" class of approaches mentioned in the next section. These approaches rely on extending the original XML document with a (possibly large) number of nodes designed to support controlled access. In a significant departure from these approaches, we do not augment the original document with additional nodes to support multiple keys. We define an efficient algorithm to generate a key assignment function, which is used to allocate the minimum number of keys needed to provide controlled access to various parts of XML documents. The second algorithm, for a given permission policy, finds the largest sub-trees that can be collapsed into a single encrypted node. These two algorithms are used to encrypt the original XML document and hide the structure of the largest possible parts of the document. Permissions are represented by meta-information which is visible only to authorized users. A complete, multi-encrypted document gives a secure and controlled access to numerous users.

This paper is organized as follows. Section 2 provides a description of related work. Section 3 presents a construction of multi-encrypted documents, Section 4 describes various generalizations of roles, and Section 5 concludes the paper.

Related Work

We start by reviewing access control rules for XML documents. There are two basic approaches to defining who has permissions on a given document. In the first approach, called a role-based approach [Ferraiolo], each user of the system is assigned one or more roles, which determine permissions to access documents. Then, when a document is created, various roles determine permissions. For example, for a document D containing lecture notes for a course, a user in a "student" role may have read access to D, while a user in an "instructor" role may have read and write access to D . In the second approach, called a credential-based approach, a user is provided with attributes forming her or his credential, and then permissions are associated with credentials (typically, credentials are given per document). Credentials may be specified in various ways; for example Carminati et al [Carminati 2005c] define credentials by specifying XPath expressions [XPath] for a special "credential" document. A closer examination of these two approaches reveals that they are actually equivalent, because dynamic roles define credentials and vice-versa; in this paper we will employ a role-based approach. Now, let's review various approaches for defining rules stating what permissions specify. Permissions may be assigned on a per document basis, or at a finer level of granularity. For example, a user within role R1 may have read access to a specific part of the document D, while the user within role R2 may have read access to another (possibly overlapping) part of D.

The published approaches to XML access control can be loosely classified into three categories. The earliest approaches tended to follow a client/server structure (e.g. [Kudo 2000] and [Damiani 2002]), in which a server node is responsible for processing client requests by generating an appropriate view over an XML document in accordance with a pre-defined access control policy. While these strategies succeed in hiding the original, complete document from clients, such systems typically suffer from the sheer number of views which must be materialized in accordance with a complex access control policy. In addition, such approaches suffer from scalability issues and are vulnerable to attacks which exploit the fact that the server acts as a single point of failure (e.g. denial of service attacks). The recent shift to more distributed paradigms such as peer-to-peer and mobile ad hoc networks has led to the introduction of secure publishing approaches to XML access control (e.g. [Bertino 2002], [Crampton 2004], [Miklau 2003]). Rather than generate a customized view over the document for each class of users, a cryptographic transformation is applied to the document before it is published, typically through the use of public/private keys. Hence, all network users receive the same document, yet different users may have different access privileges depending on which keys they hold. Finally, secure third-party distribution strategies (e.g. [Bertino 2004], [Carminati 2005a], [Carminati 2005b]) are designed to address the situation in which the document owner and publisher are separate entities. This scenario occurs most frequently in service-oriented architectures, where the owner elects to utilize a (possibly untrusted) third-party to store the document and execute queries on it in response to client requests. Additional mechanisms such as digital signatures and message authentication codes (MACs) are frequently employed in these approaches to ensure that the publisher has not altered the owner's data and to ensure that the publisher is returning a complete and correct response to a client query.

There are two papers that specifically tackle issues considered in our paper. Miklau and Suciu [Miklau 2003] describe a scheme in which the owner publishes a partially encrypted XML document, which enforces all access control policies. Their approach uses so-called protection trees, in which the original XML document is extended with meta-nodes used to distribute keys. Carminati et al [Carminati 2005b] describe a framework for secure and controlled access in third-party distribution systems. In a manner similar to our technique, they encrypt all document portions to which the same policy applies with the same key. However, both approaches have the drawback of requiring the definition of security-enhanced documents that have (possibly numerous) meta-nodes added to the original document, therefore increasing the original document size by a potentially large amount.

Multi-encryption

In this section, we consider schema-less XML documents. We model XML documents as document trees. In the remainder of the paper, a key pair κ consists of a (public key κc, private key κe).

Introduction

Within our model, we make the following assumptions:

  • there is a fixed set of roles Ψ = {R1, R2, ...,Rt};
  • users who enter the system are assigned one or more roles;
  • for each role R, there is a key κR, called an external key associated with this role; and
  • the public part of the key κR is available to all users but the private part of the key κR is available only to users who are currently in role R.

Initially, we will restrict our focus to static roles (i.e. roles that are pre-existing before the permission policy is created); in Section 4, we will additionally consider dynamic roles (i.e. roles that can be defined by the document owner). Now, we define a view, which is a part of an XML document.

Definition 1. For an XML document D, let a view VD for D be defined as a pair (D, e), where e is an extended XPath path expression. Here, an extended XPath path expression is of the form p, or ¬p, where p is a valid XPath path expression for D. □

Hence, the view VD = (D, e) consists of the subset of the set of nodes of D which is returned by an evaluation of the path expression e. Since it may be cumbersome to define a complement of a view using XPath, for convenience we introduce ¬p as the complement of the set of nodes returned by an evaluation of p.

Specifying permissions for the document D requires the definition of the following:

  • a number of generic views (let V denote the union of all these views); and
  • two special views, Vread and Vwrite which define elements for which all users have, respectively, read and write access. All nodes in Vread ∪ Vwrite are not encrypted.

Additionally, note the following:

  • Elements from V ∩ Vread and V ∩ Vwrite are not encrypted; i.e. Vread and Vwrite specifications overrule V specifications.
  • Let V0 = D - (V ∪ Vread ∪ Vwrite) be the set of all elements which have not been defined in the above procedure. These elements will be hidden, i.e. encrypted and inaccessible to any user.
  • Here, a write permission does not automatically give a read permission (this property is useful in many scenarios; for example, operating on fingerprints).

To define access control to a document, a role is associated with one or more views so that the following protection requirement condition is satisfied:

The user in role R can access precisely the set of nodes formed by the set union of the views associated with R, in addition to nodes from the sets Vread and Vwrite.

Therefore, a user who has been assigned to a view V can access all nodes contained in V, Vread ∪ Vwrite, and nothing more. Roles are associated with views in a permission policy.

Definition 2. Given an XML document D, roles Rj ∈ Ψ for j =1,..., t, and views VD i for i = 1,..., k, a single permission pj is a tuple which associates one or more views with Rj, and is of the form pj = [Rj, read, VD i1 , VD i2 ,..., VD im , write, VD h1 , VD h2 ,...,VD hn ] where (m,n ≤ k). A permission policy Π(D) is a tuple [p1, p2,..., pt, Vread,Vwrite]; where for i=1, 2,..., t, pi is a single permission. □

For a single permission pj, the user in role Rj can access for reading document fragments defined by views that follow "read", and can access for writing document fragments defined by views that follow "write". We adopt some simplifying conventions; e.g. we skip the write part if there are no views in this part.

In order to consider access to various views, it is convenient to define a multi-view document, which abstracts away from roles:

Definition 3. For an XML document D, and a permission policy Π(D), a multi-view document is a tuple DΠ = [D, VD 0, VD 1,..., VD k, VD k+1], where

  • VD 1,...,VD k, VD read, VD write are all the views in Π(D);
  • VD k+1 = VD read ∪ VD write; and
  • VD 0 is an additional view which contains all nodes which do not belong to any view VD i, i = 1, 2,..., k. □

Now, we describe the basic ideas behind our techniques (for more details, see the next section). Various parts of the document will be encrypted with different keys. However, these keys can not simply be assigned per view. For example, consider a document D = {d1, d2, d3} and two views V1 = {d1, d2} and V2 = {d2, d3}. If we assigned keys κ1 and κ2 respectively to V1 and V2, and then encrypt the views with the corresponding keys, then the element d2 would have to be super-encrypted (using keys κ1 and κ2). In addition, the user in V1 would need the key κ2 to access d2, but this key gives access to d3, violating the protection requirement. In Section 3.3, we explain how keys are created and used to encrypt parts of the document.

Keys may be used to encrypt each element separately, but doing so would reveal the structure of the document and would be less efficient than using the W3C XML Encryption Syntax and Processing Recommendation [XML Encryption] and encrypting with a single key the entire sub-tree rooted at a specific element. Whether or not this is possible depends on specifications of views. Consider a document D, with two partially overlapping views: V1, which is a sub-tree rooted at node d, and V2, which consists of a single element c within V1. In this case, we can not encrypt the sub-tree V1 with a single key; however we could do so for a sub-tree rooted at c. Therefore, we will identify in D the largest sub-trees that can be encrypted with a single key. When a document is created, it is assigned additional information, called meta-information, that specifies the permission policy specifying which users can access various views of the document. We require that the following pseudo-anonymity condition is satisfied:

Meta-information specifying what parts of the document are available in role R is visible only to users in role R.

In the following subsections, we describe the process of creating a multi-encrypted document. This process consists of the following steps:

  1. identifying largest sub-trees of the document
  2. generating keys used to encrypt nodes specified by views
  3. using these keys to encrypt parts of the document
  4. creating meta-information associated with the document
  5. creating the multi-encrypted document

Sub-tree Identification

Let us now fix an XML document D, a permission policy Π(D), and the corresponding multi-view document DΠ = [D, VD 0, VD 1,..., VD k, VD k+1]. Algorithm 1 identifies the set ΘD of roots of all largest and complete sub-trees which consist of nodes from exactly one view. A sub-tree rooted at d ∈ D is called complete if it consists of all descendents of d, and is of height at least two.

Sub-tree Identification Algorithm 1.

Input: multi-view XML document DΠ = [D, VD 0, VD 1,..., VD k, VD k+1].

Output: set ΘD = {d ∈ D: the largest complete sub-tree θ(d) rooted at d containing elements from exactly one view in DΠ}.

Method. Here, we provide a sketch of the algorithm, formulating our problem as searching for the largest complete single-color sub-trees in a multi-colored tree. The algorithm works by using a modified depth-first search (dfs) traversal of the tree D, starting from the root of D, and using an auxiliary list of nodes. When a leaf is reached, dfs returns the colors associated with this leaf. When a list of children of node d is traversed in a dfs order, then as long as all the current sub-trees have a single color which is the same as that of d, roots of these sub-trees are stored in an auxiliary list. Otherwise, the elements of the auxiliary list are moved to the solution set, and from now on all roots of single-colored sub-trees are being added to the solution set. □

In the description of Algorithm 1, we have not considered the presence of ID-IDREF relationships between tree nodes: for example, if there were a node x contained in the sub-tree rooted at node d which contained a reference to a node y not within this sub-tree. In this case, a user with access to x should logically have access to y as well. Our future work will consider fine-tuning the algorithm to additionally color these referenced nodes with the color(s) of the referring node.

Note that from the construction of the set ΘD it follows that for any two elements d1, d2 ∈ ΘD, θ(d1) is disjoint from θ(d2).

Example 1. Consider the document D shown in Fig. 1 with two views V1 = {C,E,G,H} and V2 = {C,D}. The set ΘD consists of one tree: θ(E) = {E,G,H}.

Figure 1: Simple multi-view document.
[Link to open this graphic in a separate page]

In the next section, we describe how keys are generated for various parts of a multi-view document.

Key Assignment

The key assignment process assigns internal keys to nodes in a document D, based on how the set of nodes is partitioned by views. Let us now consider an XML document D, a permission policy Π(D), and the corresponding multi-view document DΠ = [D, VD 0, VD 1,..., VD k, VD k+1], and consider the task of assigning a set of keys K that will enforce Π(D) over DΠ.

Definition 4. A key assignment function is a mapping ξ:D->K, where D is the XML document and K is the domain of assignable key sets. □

We use a key assignment function ξ as follows.

  • The node s ∈ VD i will be encrypted with ξ(s).
  • To encrypt nodes in VD i, we will need the set of keys defined as Neededξ(VD i) = ξ(VD i).
  • The set of nodes in D that can be decrypted with keys from the set of keys K0 is defined as Availableξ(K0) = ξ-1(K0).

The protection requirement for the view VD i is satisfied iff Availableξ(Neededξ(VD i)) = VD i. This condition is clearly satisfied by any one-to-one function ξ:D->K, however such functions may unnecessarily assign too many keys, as illustrated by Example 2.

Example 2. Consider a document D with the set of nodes {s1,s2,s3}, views V1: {s1,s2} and V2: {s1,s2,s3}, and the following two key assignments ξ1 and ξ2 (below, keys κi ∈ K):

ξ1(si) = κi for i = 1, 2, 3; and

ξ2(s1) = ξ2(s2) = κ1, ξ2(s3) = κ2.

Here, both functions satisfy the protection condition: ξ1 because it is one-to-one, and ξ2 because we have

Neededξ2 (V1) = {κ1};

Availableξ2 (Neededξ2 (V1)) = V1;

Neededξ2 (V2) = {κ12}; and

Availableξ2 (Neededξ2 (V2)) = V2. □

Above, the function ξ2 is preferred over ξ1, because it assigns fewer keys. Therefore, instead of using one-to-one functions, we will use functions that are weakly one-to-one, but still satisfy the protection condition. First, we define a characteristic vector χ:D->{0,1}n where n is the total number of views in D, as follows: χ(s) = {[c1, c2,..., cn]: for i = 1, 2,..., n; ci = 1 if s ∈ VD i and ci is 0 otherwise}.

Definition 5. A key assignment ξ:D->K is said to be correct if it satisfies the following condition: ξ(s) = ξ(t) => χ(s) = χ(t) for any two elements s, t ∈ D. □

Example 3. Consider a document D with the set of nodes {s1, s2, s3, s4, s5, s6}, and three views defined as follows:

V1 = {s1, s2, s3, s5, s6}, V2 = {s2, s3, s4}, and V3 = {s1, s5}.

Hence, we obtain the characteristic vectors χ(s1) = [1, 0, 1], χ(s2) = [1, 1, 0], χ(s3) = [1, 1, 0], χ(s4) = [0, 1, 0], χ(s5) = [1, 0, 1], and χ(s6) = [1, 0, 0].

Let us consider the following key assignment:

ξ(s1) = ξ(s5) = κ1, ξ(s2) = ξ(s3) = κ2, ξ(s4) = κ3, and ξ(s6) = κ4.

It can be easily verified that the above key assignment is not one-to-one but it is correct; for example, we have

ξ(s1) = ξ(s5) and χ(s1) = χ(s5). □

Lemma 1. If the key assignment ξ is correct then the protection condition is satisfied, i.e.

Availableξ(Neededξ(VD i)) = VD i, for i = 1, 2,..., n.

Proof. Consider a fixed i, 1 ≤ i ≤ n. It is enough to show that Availableξ(Neededξ(VD i)) = ξ-1(ξ(VD i)) is a subset of VD i. Take x ∈ ξ-1(ξ(VD i)), i.e. ξ(x) ∈ ξ(VD i), and suppose that x does not belong to VD i. There exists y ∈ VD i such that ξ(x) = ξ(y), and from the key assignment property we have χ(x) = χ(y). However, the i-th component of χ(y) is 1, and the i-th component of χ(x) is 0, which gives us the required contradiction. □

Now, we present an algorithm which produces a key assignment:

Key Assignment Algorithm 2.

Input: DΠ = [D, VD 0, VD 1,..., VD k, VD k+1].

Output: correct key assignment ξ:D->K.

Method. Let nodes of D be {s1, s2,..., sm} and consider a sequence of characteristic vectors χ(s1), χ(s2),..., χ(sm). We sort this sequence and remove duplicates; the number of remaining vectors is the number of keys assigned. Then, we associate keys with remaining vectors; for s ∈ D, we define ξ(s) to be a key associated with χ(s). □

Theorem 1. The key assignment algorithm produces a correct key assignment with time complexity O(n), where n is the number of nodes in document D, and it produces the minimum number of keys.

Proof. We have to show that the function ξ generated in the algorithm satisfies Definition 5. However, if two elements are assigned the same key, then from the method used in the algorithm, they have the same characteristic vectors. To get the time complexity, note that each characteristic vector is of fixed length, and can therefore be sorted in linear time using radix sort. Therefore, it takes O(n) time to sort these vectors and remove duplicates. To show that the algorithm is key-optimal, let us use an inductive argument and assume that S'={s1, s2,..., si-1} are assigned key-optimally with r keys. If si has a distinct characteristic vector than elements of S', then, since si requires a distinct key, r+1 keys are required, and the algorithm does assign r+1 keys. If si has the same characteristic function as sj, where 1 ≤ j ≤ i, then only r keys are still required, and the algorithm does assign r keys. □

Example 4. For Example 3, the characteristic vectors are the following:

χ(s1) = [1, 0, 1], χ(s2) = [1, 1, 0], χ(s3) = [1, 1, 0], χ(s4) = [0, 1, 0], χ(s5) = [1, 0, 1], χ(s6) = [1, 0, 0].

After removing duplicates, we have four vectors:

[1, 0, 1], [1, 1, 0], [0, 1, 0], [1, 0, 0]

to which we assign four keys; respectively κ1, κ2, κ3, κ4.

For the above key assignment, we have

Neededξ(V1) = {κ1, κ2, κ4}, Availableξ(Neededξ(V1)) = {s1, s2, s3, s5, s6} = V1,

Neededξ(V2) = {κ2, κ3}, Availableξ(Neededξ(V2)) = {s2, s3, s4} = V2, and

Neededξ(V3) = {κ1}, Availableξ(Neededξ(V3)) = {s1, s5} = V3. □

Creating a Multi-Encrypted Document: Step 1, Encrypting

Recall that the creator (owner) of a document D wants to define access permissions to this document for various users through the permission policy Π(D). Based on specifications in Π(D), the system will create the multi-encrypted document EncΠ(D). The document EncΠ(D) will be made available to other users, who will be able to test their permissions and use the parts of D made accessible to a role R as long as they are in this role.

This section describes the first step towards building a multi-encrypted document EncΠ(D) of the document DΠ = [D, VD 0, VD 1,..., VD k, VD k+1], in which parts of D will be encrypted, and the resulting document will be denoted by Enc-1Π(D).

Consider a multi-view document based on the permission policy Π(D):

DΠ = [D, VD 0, VD 1,..., VD k, VD k+1].

Let ΘD be the set of trees generated by Algorithm 1, and ξ be a key mapping generated by Algorithm 2. In step 1, elements from the set VD k+1 are not encrypted; the remaining elements d are encrypted with the private key of the internal key pair ξ(d). Specifically, elements of the set D-ΘD are encrypted using a single-element encryption. On the other hand, for an element d∈ΘD, we encrypt the entire tree θ(d) using the W3C XML Syntax and Processing Recommendation [XML Encryption].

For the document DΠ from Example 1, the multi-encrypted document Enc-1Π(D) is shown in Fig. 2. Here, A', B', etc. represent encrypted single nodes A, B, etc. However, E' represents an encrypted sub-tree θ(E).

Figure 2: Encrypted document Enc-1Π(D).
[Link to open this graphic in a separate page]

Note that the encryption process partially hides the structure of the document; the structure of sub-trees from the set ΘD is hidden, but the structure of other nodes is revealed (however, values at these nodes are encrypted).

From the construction used in our algorithms, we derive the following theorem.

Theorem 2. For a multi-view document DΠ = [D, VD 0,VD 1,VD 2,...,VD k+1], if all views VD i, i = 1, 2,..., k+1 form disjoint sub-trees, then the structure of these views is hidden in the encrypted document. If two views Va and Vb form overlapping sub-trees where Va is a sub-tree of Vb, then the structure of the sub-tree Va is hidden, but the structure of Vb is partially revealed. □

The above theorem shows that the design of the protection policy has implications on the visibility of the structure of the encrypted document. In the next section, we complete our description of the multi-encrypted document.

Creating a Multi-Encrypted Document: Step 2, Adding Meta-information

This section describes the second step towards building a multi-encrypted document EncΠ(D) of the document D, associating with D meta-information denoted by an XML document ACL(D). Consider a permission policy Π(D) as defined in Definition 2. The document ACL(D) consists of a number of <role> elements; a single <role> element defines read or write permissions for one or more nodes of the document D, corresponding to the views associated with this role. Specifically, the structure of each node in ACL(D) is as follows:

          
<role name="R">
    <permission name="read">
      <node xpath="..." key="..."/> 
    <permission>
    ...
</role>

Now, we describe the contents of <node> elements. Consider a fixed single permission pj = [Rj, read, VD i1 , VD i2 ,..., VD im , write, VD h1 , VD h2 ,...,VD hn ]. Let Vr be the union of all views from the read part, and let Vw be the union of all views from the write part. Let ξ(Vr) be the set of internal keys used to encrypt read views; ξ(Vr) = {κ1, κ2,...., κp}. Similarly, let ξ(Vw) be the set of internal keys used to encrypt write views; ξ(Vw) = {κp+1, κp+2,...., κp+r}. Now, we define XPath path expressions that represent elements of D which are encrypted with the same key. For i = 1, 2,..., p+r, let ti be the XPath path expression that defines ξ-1i). There will be p+r <node> elements. For the read permissions, we will have

<node xpath="ti" key="κic"/>

for i = 1, 2,..., p (where κic is the public key of the key pair κi). For the write permissions, we will have

<node xpath="ti" key="κie"/>

for i = p+1, p+2,..., p+r (where κie is the private key of the key pair κi).

The content of each <permission> element is encrypted with the public key of the external key pair associated with the corresponding role. This design supports pseudo-anonymity; only a user in role R can discover the permissions assigned to that role. For a read permission, the key attribute contains the public key(s) of the internal key pair(s), while for a write permission only the private key(s) of the relevant internal key pair(s) are included. Thus, a user in role R can decrypt the <permission> element for this role, obtain the set of nodes available to her, and collect all keys which are needed to decrypt the information stored in these nodes.

For example, consider a permission policy Π(D) = { [R1, read, V1], [R2, write, V2] }, where V1 = {s1, s2} and V2 = {s2, s3}. For i = 1, 2, 3, let ξ(si) = κi and ti be an XPath path expression that defines ξ-1i). In this scenario, there will be two <role> elements, shown below.

          
<role name="R1">
  <permission name="read">
     <node xpath="t1" key="κ1c"/>
     <node xpath="t2" key="κ2c"/>
  </permission>
</role>
<role name="R2">
  <permission name="write">
    <node xpath="t2" key="κ2e"/>
    <node xpath="t3" key="κ3e"/>
   </permission>
</role>
  

For the read permission associated with role R1, the public keys of internal key pairs κ1 and κ2 are provided, while for the write permission associated with role R2, the private keys of internal key pairs κ2 and κ3 are made available. (Recall that <permission> elements are encrypted with the public keys of the corresponding external key pairs.) For example, for the nodes specified by XPath t2, both keys of the internal key pair κ2 are provided so that these nodes can be read and modified.

Creating a Multi-Encrypted Document: Final Step

In this section, we complete the description of the process of creating a multi-encrypted document. For a document D, and a policy Π(D) = {p1, p2,..., pt}, we define the multi-encrypted document EncΠ(D) that conforms to the permission policy Π(D). In this step, we will use Enc-1Π(D) as described in Section 3.4, and ACL(D) as defined in Section 3.5.

We assume that the document D is created by its owner, denoted as C. Peer C has a key pair κC and is provided with public keys of external key pairs associated with all roles specified in the permission policy Π(D). The document EncΠ(D) will be made available to other users, who will be able to test their permissions and make use of the regions of D made accessible to a role R as long as they are in this role (i.e. they have the private key of the external key pair associated with R).

Definition 6. Consider an XML document D, a set of roles Ψ, and a permission policy Π(D) as defined in Definition 2. Let κi be an external key pair associated with the role Ri. Finally, let κC be the creator's key pair. The multi-encrypted document EncΠ(D) is a triple [Enc-1Π(D), ACL(D) , CertD] where the certificate CertD contains the identification of the owner, the digital signature of the ACL(D) signed using κC, and the public key of the key pair κC (this certificate is signed by a certification authority). □

Now consider a user Q, who accesses the document EncΠ(D). Assume that Q is currently in role Ri, and so it has access to the private key of the external key pair κi. Q determines its permissions on D using the following procedure:

  1. Q retrieves the certificate CertD and uses it to determine the owner C of D (Q may verify this certificate by accessing the certificate authority). Once this certificate is verified, Q can trust that the public key of κC, stored in this certificate, belongs to C.
  2. Q accesses ACL(D), specifically the <role> element associated with role R; if such an element does not exist then Q does not have any permissions for D.
  3. Q tries to decrypt the <role> element for R with the private key of κR. If Q fails, the ACL(D) has been tampered with; if it is successful, then the nested <permission> element specifies Q's permissions over D.

There are several phases during which data may have been tampered with. First, the ACL(D) may have been changed. However, its integrity may be verified using the digital signature computed over ACL(D), which can be retrieved from the certificate CertD. Second, an impostor T who does not have the private key of the external key pair may attempt to modify the document D, and re-encrypt it with a different key. However, in such a case any user Q who has a read permission on D will realize that the document has been tampered with because it cannot be successfully decrypted with the corresponding private key.

Generalization of Roles

There are two generalizations of roles that can be introduced without necessitating any major changes of the formalism provided in this paper. First, we can introduce a partial acyclic order in the set of roles; we will say that role R1 is stronger than role R2 if all permissions associated with R2 are also available in R1. If the document owner has specified this order between roles, then in Definition 2 all views from R2 will also be (implicitly) present in R1.

The second generalization involves dynamic roles. Under this approach, the creator of a document may specify a new role R, and use it to define the permission policy. The techniques described in Section 3 can then be applied to this policy; in particular an external key can be generated for R, and Algorithms 1 and 2 can be used in the usual manner to create a multi-encrypted document. If a peer Q should be able to access parts of the document, then Q will have to be provided with the private key of the external key pair associated with R a priori via a secure channel.

Conclusions and Future Work

In this paper, we described the process of building multi-encrypted documents that provide secure and controlled access to XML documents. We first defined an algorithm which, for a given permission policy, finds the largest sub-trees that can be collapsed into a single encrypted node. Next, we specified a key assignment algorithm that allocates the minimum number of keys needed to provide controlled access to various regions of an XML document. These two algorithms are used to encrypt the original XML document and hide the largest possible parts of the document’s structure. For an XML document, the corresponding multi-encrypted document is created by identifying largest sub-trees of the document, generating keys and using these keys to encrypt parts of the document, creating meta-nodes and encrypting them. A complete, multi-encrypted document satisfies the pseudo-anonymity and protection requirements with the minimum number of keys, and so it provides a secure and controlled access to the document by numerous users.

In this paper, we leave for future work the problem of ensuring data integrity of a document (i.e. to detect when its contents have been modified). This problem can be addressed using Merkle hash functions, as defined in [Merkle 1989].

Our approach assumes that the AC policy is known at encryption time, and we intend to investigate strategies for allowing subsequent changes to the AC policy after the document has been initially published. We consider only read/write operations, and more work is required for updates. Finally, we will be working on implementing techniques described in this paper and using them for providing controlled access to shared documents in a peer-to-peer (P2P) environment.


Acknowledgments

The authors would like to acknowledge the assistance provided by Rick Giles with the proof for Theorem 1.


Bibliography

[Bertino 2002] E. Bertino and E. Ferrari. "Secure and Selective Dissemination of XML Documents". ACM Transactions on Information and System Security (TISSEC) 5(3):290-331, 2002.

[Bertino 2004] E.Bertino, B.Carminati, E.Ferrari, B. Thuraisingham, A. Gupta. "Selective and Authentic Third-Party Distribution of XML Documents". IEEE Transactions on Knowledge and Data Engineering (TKDE) 16(10):1263-1278, 2004.

[Carminati 2005a] Barbara Carminati, Elena Ferrari, Elisa Bertino. "Securing XML Data in Third-Party Distribution Systems". In Proc. of the 14th ACM Conference on Information and Knowledge Management (CIKM 2005), 99-106, 2005.

[Carminati 2005b] Barbara Carminati, Elena Ferrari, Elisa Bertino. "Assuring Security Properties in Third-party Architectures". In Proc. of the 21st IEEE International Conference on Data Engineering (ICDE 2005), 547-548, 2005.

[Carminati 2005c] Barbara Carminati, Elena Ferrari. "AC-XML Documents: Improving the Performance of a Web Access Control Module". In Proc. of the 10th ACM Symposium on Access Control Models and Technologies (SACMAT 2005), 67-76, 2005.

[Crampton 2004] J. Crampton. "Applying Hierarchical and Role-based Access Control to XML Documents". In Proc. 2004 ACM Workshop on Secure Web Services, 37-46, 2004.

[Damiani 2002] E. Damiani, S. De Capitani di Vimercati, S. Paraboschi, and P. Samarati. "A Fine-Grained Access Control System for XML Documents". In ACM Transactions on Information and System Security 5(2): 169-202, 2002.

[Devanbu 2001] P. Devanbu, M. Gertz, A. Kwong, C. Martel, G. Nuckolls, and S.G. Stubblebine. "Flexible Authentication of XML documents". In Proc. of the 8th ACM Conference on Computer and Communications Security, 136-145, 2001.

[Ferraiolo] D.F. Ferraiolo and D.R. Kuhn. "Role Based Access Control". In Proc. of the 15th National Computer Security Conference, 1992. http://csrc.nist.gov/rbac/ferraiolo-kuhn-92.pdf.

[Kudo 2000] M. Kudo and S. Hada. "XML Document Security Based on Provisional Authorization". In Proc. Of the 7th ACM Conference on Computer and Communications Security (CCS '00), 87-96, 2000.

[Merkle 1989] R. Merkle. "One Way Hash Functions and DES". In Proc. of CRYPTO 89, 428-446, 1989.

[Miklau 2003] G. Miklau and D. Suciu. "Controlling Access to Published Data Using Cryptography". In Proc. of the 29th International Conference on Very Large Data Bases (VLDB 2003), 898-909, 2003.

[XML Encryption] XML Encryption Syntax and Processing. W3C Recommendation 10 December 2002. http://www.w3.org/Encryption/2001/.

[XPath] XML Path Language (XPath) Version 1.0. http://www.w3.org/TR/xpath.



Using Multi-Encryption to Provide Secure and Controlled Access to XML Documents

Tomasz Müldner [Jodrey School of Computer Science, Acadia University, Wolfville, Canada]
tomasz.muldner@acadiau.ca
Gregory Leighton [Department of Computer Science, University of Calgary, Calgary, Canada]
gleighto@cpsc.ucalgary.ca
Jan Krzysztof Miziołek [Centre for Studies on the Classical Tradition in Poland and East-Central Europe, Warsaw University, Warsaw, Poland ]
jkm@obta.uw.edu.pl