Among the more contentious changes between XSLT 1.0 and XSLT 2.0 are the more complex type system, the typing of variables and parameters, and the requirement for explicit casting between types. Can XSLT authors continue to work as they always have and ignore the typing-related changes? Well, no. Some of the complexities of the type system can be ignored by using those built-in types that have good support in XPath 2.0; some of the new functionality (such as declaring the type of parameters within a named template or function definition) can be useful. But the requirement for explicit casting is far more irritating than helpful to those working with the untyped data that is available to XSLT authors not using schema validation.
Of all the new features in XSLT [Extensible Stylesheet Language Transformations] 2.0, the ones that have caused the most contention — and worry — amongst existing XSLT authors are those involving typing: strong typing, static typing, and validation against schemas. These features have been introduced due to two general requirements: the requirement to align with W3C XML Schema, which entails using XML Schema’s type system and validation methods; and the requirement to support analysis of queries prior to their execution and warn authors of potential problems with them.
When XSLT authors express their concern about these changes, they are generally assured that, at the end of the day, if they don’t want to use the new type-related features that XPath 2.0 brings, they can always ignore them by using processors that are not schema aware and do not carry out static type checking.
In this paper, I’ll look at the extent to which this assurance is true: to what extent is it possible to ignore the new typing features in XPath 2.0? If they can’t be ignored, what are the kinds of changes that XSLT authors will have to get used to? And most importantly, do these changes offer users any real benefits?
This paper is based on the May 2003 XPath 2.0 and XSLT 2.0 Working Drafts. While most of the features discussed here are unlikely to change, some of the details may alter between these drafts and the Recommendations, particularly in response to comments addressed to email@example.com.
XSLT 2.0 offers two conformance levels for XSLT processors: Basic, which does not require support for understanding schemas, and Schema-Aware, which does. For the purposes of this paper, I am going to focus on XSLT authors who are not using schemas to validate the XML documents that they work with, either because such schemas do not exist or because they want their stylesheets to be portable to Basic XSLT processors. The built-in types defined in XPath 2.0, including all the built-in types from XML Schema, are still available in these stylesheets, but without schema validation, nodes are all untyped and no user-defined types are available. This means that schema-unaware XSLT 2.0 stylesheets do not benefit from some features that are undoubtedly useful in XSLT 2.0, in particular, the ability to create templates that match, with a simple pattern, groups of elements based on either their name (via a substitution group) or type.
I will also constrain the discussion to behavior without the XPath 1.0 compatibility mode set. When set, XPath 1.0 compatibility mode alters the way that values are converted to each other, the aim being that expressions that were valid in XPath 1.0 work in the same way as they did in XPath 1.0. While XPath 1.0 compatibility mode will doubtless prove useful in migrating from XSLT 1.0 to XSLT 2.0, I want to focus here on the end-point of that journey, particularly since some portions of a stylesheet (in particular, the code within a function definition) are never run under the backwards compatibility rules.
This paper will start with a survey of the typing-related differences between XPath 1.0 and XPath 2.0, to explain why XPath 1.0 is characterized as a weakly typed language while XPath 2.0 is characterized as a strongly typed language. The differences fall into three categories, which I’ll go on to look at in more detail. First, they differ in the type system that’s used: the XPath 2.0 type system is based on the XML Schema type system, and I’ll discuss how to use these types in XSLT 2.0. Second, XPath 1.0 and XPath 2.0 differ in the rules used when converting between types; I’ll survey what these rules mean in terms of the expressions that you can use. Thirdly, in XPath 2.0 variables and parameters are typed whereas in XPath 1.0 they are not; again, I’ll discuss the impact of this change on XSLT authors. Finally, I’ll draw some conclusions about the relative ease of writing stylesheets in XSLT 2.0 due to these changes.
So what are the type-related differences between XPath 1.0 and XPath 2.0? Perhaps the biggest difference is that XPath 1.0 is said to be a weakly typed language while XPath 2.0 is said to be a strongly typed language. The term “strongly typed language” is fairly nebulous, meaning different things to different people. A good list of possible overlapping definitions for “strongly typed language” can be found at http://c2.com/cgi/wiki?StronglyTyped:
Let’s look at each of these in turn to see which characteristics of XPath 2.0 make it a “strongly typed language”.
A language is strongly typed if type annotations are associated with variable names, rather than with values. If types are attached to values, it is weakly typed.
In XPath 2.0, type annotations are associated with both variable names and with values. Both atomic values (scalars) and nodes are annotated with types. When they are declared (using XSLT1), variables and parameters can be assigned types using the as attribute. If you try to assign a value to a variable or parameter and the type of the value does not match the type of the variable or parameter, then you will get a type error. This differs from XPath 1.0 in which variables do not have types but values do.
A language is strongly typed if it contains compile-time checks for type constraint violations. If checking is deferred to run time, it is weakly typed.
A language is strongly typed if there are compile-time or run-time checks for type constraint violations. If no checking is done, it is weakly typed.
XPath 2.0 does define how to carry out compile-time checks of type constraint violations, otherwise known as static type checking. Static type checking is an optional feature in XPath 2.0, and is not required at either XSLT conformance level, but the language and the type system has been designed specifically to enable processors to support it if they wish. In particular, XPath 2.0 ensures that it is always possible to identify the type of the result of an expression during analysis even when that expression is polymorphic (returns different results for operands of different types). If the processor does not support static type checking, it will report type errors at run time instead.
In XPath 1.0, there are fairly few situations when there are type constraints to violate (since most types can be converted to each other on demand) — checking that a predicate is actually being applied to a node set (rather than to a string or number) is one example of type checking in XPath 1.0. There is no statement in XPath 1.0 about when type checking occurs, so as with XPath 2.0, it’s really up to the processor. However, since variables are not typed, it can be quite hard work to do type checking during the analysis phase.
A language is strongly typed if conversions between different types are forbidden. If such conversions are allowed, it is weakly typed.
A language is strongly typed if conversions between different types must be indicated explicitly. If implicit conversions are performed, it is weakly typed.
Both XPath 1.0 and XPath 2.0 allow you to convert between (some) different types. As we’ll see in more detail later, in XPath 2.0 most conversions must be carried out explicitly using the constructor functions (such as xs:date() and xs:double()) or a cast as expression whereas in XPath 1.0 most conversions are carried out implicitly.
A language is strongly typed if there is no language-level way to disable or evade the type system. If there are casts or other type-evasive mechanisms, it is weakly typed.
Both XPath 1.0 and XPath 2.0 allow you to make explicit casts of atomic values to different types. XPath 2.0 also has a treat as expression to evade the static type checking of sequences and nodes, so under this definition both versions of XPath would be characterized as weakly typed.
A language is strongly typed if it has a complex, fine-grained type system with compound types. If it has only a few types, or only scalar types, it is weakly typed.
XPath 2.0 certainly has a complex, fine-grained type system, based on that defined in XML Schema, while XPath 1.0 only has four types (six if you include XSLT 1.0’s result tree fragments and the “foreign object” type). On the compound-type front, it’s notable that while you can define the type of the items in a sequence, a sequence is not itself annotated with a type. For example, you can declare that a variable holds a sequence of xs:string items, but the sequence itself hasn’t got a type, such as “list of strings”.
A language is strongly typed if the type of its data objects is fixed and does not vary over the lifetime of the object. If the type of a datum can change, the language is weakly typed.
In a way, this definition does not apply in XPath because XPath is a functional language in which data cannot change; in any case, it is certainly true that the the type of a piece of data is fixed in both versions of XPath.
The following table summarizes these characteristics:
|Characteristic||XPath 1.0||XPath 2.0|
|typed variables||no (weak)||yes (strong)|
|static type checking||implementation dependent (weak)||implementation dependent (weak)|
|type checking||yes (strong)||yes (strong)|
|conversions allowed||yes (weak)||yes (weak)|
|explicit casting required||no (weak)||mostly (strongish)|
|casting supported||yes (weak)||yes (weak)|
|complex type system||no (weak)||yes (strong)|
|type of value fixed||yes (strong)||yes (strong)|
This summary shows that while XPath 1.0 is (by most definitions) a weakly typed language and XPath 2.0 is (by most definitions) a strongly typed language, this distinction actually arises because of three main differences in XPath 2.0:
In the rest of this paper, I will look at each of these new features in XPath 2.0, the impact that it has on the stylesheets that you create and the benefits (and costs) that it brings.
Before we launch into the details of using XPath 2.0’s type system, let’s have a quick look at the way it works.
In XPath 2.0, everything is a sequence of zero or more items. Sequences cannot hold other sequences, and a singleton sequence, containing only one item, is equivalent to the single item that it contains. Items come in two flavors, nodes and atomic values.
Nodes are familiar from XPath 1.0 and represent logical structures within an XML document, such as elements and attributes. In XPath 1.0, we are used to using the name and string value of a node. In XPath 2.0, we can also use the type of a node and use the typed value of a node rather than its string value. In the kind of documents that we’re looking at in this paper — those that have not been validated against a schema or DTD2 — element nodes will have the type xs:anyType, and attribute nodes will have the type xs:anySimpleType. The typed value of all nodes in these circumstances is the same as the string value of the node, but with the type xdt:untypedAtomic.
The type of “untyped” nodes is likely to change in the next Working Draft (to xdt:untypedAny for elements and xdt:untypedAtomic for attributes), but in effect they will be treated as described here as their typed value will still be the string value of the node with the typed xdt:untypedAtomic.
Atomic values are scalars like the strings, numbers, and booleans in XPath 1.0. In XPath 2.0, every atomic value has a type from amongst the atomic types that are built in to XML Schema, plus a few extra that have been defined for the purposes of XPath 2.0. The following diagram shows the hierarchy of atomic types that are built in to XPath 2.0.
This hierarchy is daunting for those used to the three atomic types of XPath 1.0. While there’s no need to use a particular type if you don’t need it, authors will need to have some awareness of what the type hierarchy does and does not contain when writing their own functions, and need to know how it’s organized in order to use the casting rules that we look at later. In fact, the vast majority of these 45 types can be avoided. The most commonly used types will probably be:
These are the types for which there is good support in the built-in function library in XPath 2.0. There are some types that should be avoided solely because of the lack of support for them in the built-in function library. For example:
(current-date() + xdt:yearMonthDuration('P1Y')) + xdt:dayTimeDuration('P1D')
These limitations aside, the date/time, duration and qualified name support in XPath 2.0 in particular will allow us to carry out some transformations with a lot more ease, portability, and accuracy than currently. Rather than employing extensions or complicated templates to work out the difference between two times, for example, we can just subtract them from each other; rather than having to parse dates in consecutive <xsl:sort> elements in order to sort by them, we can just point to the date and let the processor do the work:
<xsl:for-each select="Trans"> <xsl:sort select="xs:date(@Period)" /> ... </xsl:for-each>
Only some of the types (namely strings and the numeric types) have corresponding literals that allow you to create values easily. To create values of most of the types or to cast to a particular type, you will need to use the constructor functions that we’ll look at in the next section. To use those functions, you will need to declare the xs namespace of http://www.w3.org/2001/XMLSchema. So the first impact of typing on XSLT 2.0 is that most stylesheets will need to have an extra namespace declaration and an exclude-result-prefixes attribute to ensure that the namespace isn’t carried through into the result document:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs"> ... </xsl:stylesheet>
If the stylesheet refers to xdt:yearMonthDuration or xdt:dayTimeDuration, then it may also need to declare the xdt namespace of http://www.w3.org/2003/05/xpath-datatypes3:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdt="http://www.w3.org/2003/05/xpath-datatypes" exclude-result-prefixes="xs xdt"> ... </xsl:stylesheet>
These additional namespace declarations are a little bit tedious, but the document element of a stylesheet is usually generated automatically or copied en masse anyway, so they are not too much of a burden for XSLT 2.0 authors.
As we’ve seen, one of the major differences between XPath 1.0 and XPath 2.0 is that while in XPath 1.0 type conversions are done implicitly (where they make sense), in XPath 2.0 they have to be done explicitly. In this section, we’ll look at what that means in practice for an XSLT author.
In XPath 1.0, string, number, and boolean values could be converted back and forth implicitly and automatically, depending on the type that was required, or explicitly using the functions string(), number() and boolean(). Node sets could be converted to any of those types by ignoring all but the first of the nodes if applicable, but none of the other types could be converted to a node set. The rules for the conversion between the types are summarized in the following table.
|from type / to type||string||number||boolean|
|string||-||string parsed as a number literal; NaN if this fails||false if string is empty; true otherwise|
|number||number converted to number literal; 'NaN' if not a number, 'Infinity' and '-Infinity' for positive and negative infinity||-||false if number is 0 or NaN; true otherwise|
|boolean||'true' if boolean is true, 'false' otherwise||1 if boolean is true, 0 otherwise||-|
|node set||string value of first node in node set||string value of first node in node set converted to a number||false if node set is empty; true otherwise|
These same conversion rules are used in XPath 1.0 whenever a value needs to be converted to a different type. And it is always possible to convert from one atomic value (be it a string, number, or boolean) to another, even if the conversion gives an unusable result (for example, when converting the string 'rubbish' to a number you get the number NaN). If a particular type is expected (for example, as the argument to a particular function or as the result of an expression held in a particular XSLT attribute) then the value will be converted to that type automatically. A few functions in XSLT 1.0, namely, id(), document(), and key(), are polymorphic, bypassing the usual conversion of node sets.
You can cast between types in XPath 2.0 with two methods. The simplest method is to use the built-in constructor functions: there’s one for each type, named after that type. For example, to cast to a xs:date you can use the xs:date() function. The second method is to use a cast expression; this is a bit more flexible in that if the value you’re casting is an empty sequence, you will get an empty sequence as a result rather than a type error. For example, if you weren’t sure whether there was a date attribute or not, and you wanted to create either an xs:date value or an empty sequence, you should use:
@date cast as xs:date?
Of course, there are many more types in XPath 2.0, and, consequently, the rules for converting between types are more complex. Whether a type can be cast to another type is based on three rules:
As a consequence of these three rules, it is possible to cast between any two types as long as it is possible to cast between the primitive types from which they are derived. To cast a source value to a target type, cast the value “up” to the primitive type of the source type, “across” to the primitive type of the target type and then “down” the hierarchy to the target type. For example, a xs:language value can be cast to an xs:Name, as long as the xs:language satisfies the pattern for an xs:Name, because they share the common primitive supertype of xs:string. Similarly, an xs:integer can be cast to an xs:boolean because it’s legal to cast from xs:decimal (the primitive supertype of xs:integer) to xs:boolean.
XPath 1.0 not only defines how to cast between the “atomic types” of string, number, and boolean, but also how to cast from a node set to one of these values. When casting to a string or number, the string value of the first node in the node set (in document order) is used; when casting to a boolean, the result is true if the node set contains any nodes and false if it is empty.
In XPath 2.0, it is a type error if you try to cast a sequence (which in XPath 2.0 can hold any mixture of atomic values and nodes) that holds more than one item to a single atomic value. This means that you must explicitly use the predicate  if you want the first item in the sequence. This is arguably a useful new constraint for XSLT authors as it’s often not clear that a particular expression is using only the first node in a node-set in XPath 1.0; making it explicit makes code easier to understand and debug.
Whether the value of a single node can be cast to a particular atomic value depends on the type of the node; in the “minimal” case that we are looking at, nodes are always untyped — we’ll be looking at the casting of untyped nodes in the next section.
If a type cannot be cast to another type, then attempting to do so will generate a type error. XSLT 2.0 authors need to be aware of which conversions are possible in order to avoid these type errors, which in turn requires familiarity with the XPath 2.0 type hierarchy. These type errors can be detected statically if an implementation wishes to.
In cases of conversions down the type hierarchy, or from an xs:string to another type, if the value doesn’t adhere to the facets of the target type, then you will get a dynamic error; again, to avoid these XSLT authors need to be familiar with the constraints on the values of the types that they use. An implementation can detect statically problems that occur when a literal (such as a string) is cast to a value and that literal isn’t a valid representation, for example:
You can prevent dynamic errors of this kind by checking whether a value is castable before casting it. For example, the following returns the xs:date generated by appending '-01' to the Period attribute if that is a valid date, and an empty sequence otherwise:
if (concat(@Period, '-01') castable as xs:date) then xs:date(concat(@Period, '-01')) else ()
Whether the generation of type errors and dynamic errors arising from type conversions is a benefit or not probably depends on the kind of environment in which you are performing transformations.
One of the general philosophies behind the design of XSLT 1.0 was that, should a processor wish, all errors could either be detected during analysis or recovered from. In processors that have a “recover from everything” philosophy (which are usually client-side processors), if the stylesheet gets through the compilation phase without an error then it will not give you an error during runtime. Put another way, if a stylesheet doesn’t give you an error with one XML document, it will not give you an error with any other XML document. The lack of errors does not mean that the result of the transformation is meaningful, but at least you are guaranteed some kind of result.
With XSLT 2.0, it’s no longer possible for a processor to have this philosophy. XPath 2.0 expressions may generate dynamic errors or type errors at run time unless you explicitly catch them by testing for castability before any explicit casting. On the plus side, the generation of errors means that the transformation simply won’t run if the data isn’t formatted as expected, which cuts down on (but by no means eliminates) “false positive” results where the transformation produces a meaningless output. On the minus side, as an XSLT author, you no longer have any guarantee that just because a transformation doesn’t produce errors with one document it will also be error free with another document; testing against multiple representative documents becomes much more important.
The major difference between XPath 1.0 and XPath 2.0 (so far as converting between types is concerned) is that in most cases the only kind of casting that happens implicitly in XPath 2.0 is casting up the hierarchy (from subtype to supertype).
One of the ways in which this is most apparent, and most irritating, is that despite the fact that almost every value can be cast to a string, that casting has to be done explicitly. For example, in the following XSLT code, I summarize each <Invoice> element with a string that numbers the invoice and counts the number of <LineItem> elements that it contains:
<xsl:for-each select="Invoice"> <xsl:value-of select="concat(position(), ': ', count(LineItem), ' items.')" /> </xsl:for-each>
In XSLT 2.0, this will raise a type error because the concat() function expects strings as arguments and the position() and count() function calls both return integers. To make the call to concat() work in XSLT 1.0, you must cast the integers to strings explicitly using the string() or xs:string() functions:
<xsl:for-each select="Invoice"> <xsl:value-of select="concat(string(position()), ': ', string(count(LineItem)), ' items.')" /> </xsl:for-each>
Another example of where explicit casting is required comes from some XML that I’ve been dealing with recently that holds account data roughly in the form:
<Trans AccNo="401000" TransType="ACT" Period="2000-05" Amount="-691126.97" />
As you can see, the Period attribute is in the format of an xs:gYearMonth. I need to get from that month to the first month of the quarter that contains the month: January for January-to-March, April for April-to-June, and so on. In XSLT 1.0, I can do this with:
floor((substring(@Period, 6, 2) - 1) div 3) * 3 + 1
This takes the month indicated in the Period attribute (the 6th and 7th characters), subtracts 1 from the value (to give an integer between 0 and 11), then divides by 3 and floors the result (to give an integer between 0 and 3), multiplies by 3 (to give one of the integers 0, 3, 6, or 9), and adds one (to give one of the integers 1, 4, 7 or 10, corresponding to January, April, July, or October).
In XSLT 2.0, this is a type error. The substring() function returns an xs:string, and you can’t subtract an xs:integer (or indeed a value of any type) from a string. To perform the calculation, I must explicitly cast the result of the substring() function to a numeric type, such as xs:integer as follows:
floor((xs:integer(substring(@Period, 6, 2)) - 1) div 3) * 3 + 1
XPath 2.0 introduces an idiv operator for dividing an integer by another integer to return an integer; in this example, it would be better to use idiv than the floor() function, as follows, since the result of the expression is then an integer rather than a decimal:
((xs:integer(substring(@Period, 6, 2)) - 1) idiv 3) * 3 + 1
Why is this casting necessary in XPath 2.0 when in XPath 1.0 numbers are automatically turned into strings and strings into numbers as required? Well, in XPath 2.0 the arithmetic operations, such as subtraction, can be used with a variety of operands — they are polymorphic — and the type of the result depends on the type of the operands. If you subtract an xs:integer from another xs:integer, you will get an xs:integer. If, on the other hand, you subtract a xdt:yearMonthDuration from a xs:date, you will get another xs:date. Many of the operators in XPath 2.0 are polymorphic, for example:
Functions in XPath 2.0 are not polymorphic4, but future versions of the language might introduce polymorphic functions so the rules governing the conversion of function arguments are roughly the same as those governing the conversion of operator operands.
Polymorphism raises problems for implementations because they have to be able to work out which version of the operator (or function) should be used. Different programming languages handle this in different ways, in general using information about:
For example, in the above case where the xs:integer 1 is subtracted from an xs:string '03' (say) and the result used on the left-hand side of an idiv expression, the relevant information is:
Based on this analysis, it’s possible to work out that the string passed as the left-hand side operand needs to be cast to an xs:integer.
In XPath 2.0, only the possible versions of the operator and the types of the operands are used to work out which version of the operator to use. The actual values of the operands aren’t used because they aren’t available during static analysis (so, relying on them would make static type checking impossible). The required type of the result isn’t used because to do so would place a great burden on the processor; unfortunately, this means that the burden of casting is placed on the shoulders of the XSLT author instead.
There are three exceptions to the general rule that casting must be explicit unless it is from a subtype to a supertype, which we look at here.
The first exception is with numeric values. There are three primitive numeric types in XPath 2.0: xs:decimal, xs:float, and xs:double. If implicit typing were only permitted from subtype to supertype, it would be impossible to use a number of one primitive type where another was expected. For example:
substring(@Period, 6, 2)
Casting in the other direction isn’t implicit, however. Most importantly, xs:double values can’t be implicitly cast to xs:integer values, which means that the results of arithmetic involving untyped nodes (which, as we’ll see, are treated as doubles) aren’t automatically treated as integers, for example:
(@value * 2) idiv 2
xs:integer(@value * 2) idiv 2
The second exception is the implicit casting of values to their effective boolean value in certain circumstances. The effective boolean value of a sequence is defined in a similar way to the conversion to boolean values in XPath 1.0. If a sequence is empty or if it consists of a single atomic value that is the xs:boolean value false, an empty string, the numeric value 0, or and xs:float or xs:double NaN, then it is converted to an xs:boolean value of false, otherwise to the xs:boolean true.
The effective boolean value of a value is different from what you get when you cast a value to a boolean with xs:boolean() — casting with xs:boolean() will give you a type error if the value is a sequence holding more than one value, or if it is of a type that cannot be cast to xs:boolean and casting the string 'false' to a boolean with xs:boolean() will return the xs:boolean false. The boolean() function can still be used to explicitly get the effective boolean value of a value in the same way as in XPath 1.0.
The effective boolean value of the value returned by an expression is used in the following situations:
The effective boolean value isn’t used when a function requires a boolean argument. For example, the escape-uri() function escapes a URI, with its first argument being the URI to escape and the second being a boolean flag that indicates whether the reserved characters (such as / and #) are escaped or not. When using this function, the value of the second argument must be an xs:boolean value: if you try to use a sequence instead, for example, you will get a type error unless the sequence contains only one item and that item is an xs:boolean value.
In XPath 2.0, the general rule is that if a node is untyped (as each node will be if you don’t use a schema or if your XSLT processor isn’t Schema-Aware), it is converted to the required type of the particular operation that it’s involved in. For example, the string-pad() function expects an integer as its second argument; if you call it with:
string-pad(' ', @indent)
In general, the implicit conversion is very useful. It works exactly as it would if you were validating the untyped value against the specified type. Under the XPath 2.0 rules, as long as a node looks like a particular type, it can be treated as that type when it’s passed as the argument to a function. There’s no problem treating a year attribute with the value '2003' as both a xs:gYear and a xs:integer depending on what you need to do with it, for example.
One thing to watch out for, however, is that when casting the untyped value of a node to a sequence, the value is not tokenized to create a sequence, but instead is interpreted as if it were a single value. For example, if you have:
<dress sizes="6 8 10 12 14 16"> ... </dress>
<xsl:function name="my:dress-fits"> <xsl:param name="sizes" as="xs:integer+" /> ... </xsl:function>
my:dress-fits(for $s in tokenize(dress/@sizes, '\s+') return xs:integer($s))
The polymorphic operators sometimes require you to explicitly state the type of the node because there is no single required type for operands to polymorphic operators, which makes it hard for a static type checker to tell which type an untyped node should be converted into in order to get a result. For example, as we saw earlier, there are several possible combinations of acceptable types when one value is subtracted from another. Different rules govern untyped values with different kinds of polymorphic operators; in most cases these rules are reasonable, given that the type to which the untyped node is cast is fairly arbitrary, but in some cases it causes some strange effects:
The value comparison and general comparison operators perform similar operations, but the value comparison operators can only compare two atomic values at a time (whereas the general comparison operators work as they do in XPath 1.0, by comparing two sequences to work out whether any pair of comparisons evaluates as true), and they are a lot stricter about the types of the values. For the purposes of a value comparison, any untyped node will be converted into an xs:string value. For example, even if the indent attribute has the string value '2', the result of the comparison:
@indent ne 0
xs:integer(@indent) ne 0
@indent != 0
If you compare an untyped node to an atomic value with a general comparison, the node is generally cast to the type of the value to which you’re comparing the node (the exception is when the comparison is with a numeric value, in which case the node’s value is cast to an xs:double — this means that all numeric values can be compared to each other). If two untyped nodes are compared, as in:
@width > @height
number(@width) > number(@height)
This works in much the same way as XPath 1.0 when it comes to = and != but differently from XPath 1.0 for the operators <, <=, >, and >=. Operands to these operators were converted to numbers in XPath 1.0, which didn’t support alphabetical comparison between strings. Thus, all the general comparison operators work consistently with each other, but not with what you might expect from XPath 1.0.
Untyped nodes that are involved in arithmetic operations are usually converted to xs:double, no matter what their values look like, what the other operand looks like, or what the required type of the result of the expression is. (The one exception is that the arguments to idiv are converted to xs:integer instead.) If, for example, you have a date attribute with the value '1999-11-16', then the XPath:
@date - current-date()
xs:date(@date) - current-date()
Similarly, if you use the XPath:
string-pad(' ', @indent + 2)
string-pad(' ', xs:integer(@indent) + 2)
When both operands to an arithmetic operator are xs:integer values, the result is an xs:integer; when the two operands are numeric values of different types, the result is the least specific of the types.
This treatment of untyped nodes by the arithmetic operators is one of the areas where explicit casting is most likely to be required, either to get date/time arithmetic rather than numeric arithmetic, or because the result of the arithmetic needs to be an xs:integer rather than an xs:douible.
In XSLT 2.0, the variable-binding elements <xsl:variable>, <xsl:param>, and <xsl:with-param> all take an as attribute that specifies the type of the value of the variable or parameter. The value of the as attribute is a SequenceType, which is a pattern that matches values of particular types, for example:
When you declare the type of a variable or parameter, then the value that you supply for that variable or parameter is cast to that type as long as it can be cast implicitly using the rules described above. In other words, if the type of the value is a subtype of the type declared for the variable or if the value is untyped, then the value will be cast as you’d expect, for example:
<xsl:variable name="date" select="current-date()" as="xs:date" />
<xsl:variable name="date" select="@date" as="xs:date" />
<xsl:variable name="date" select="'2003-04-01'" as="xs:date" />
<xsl:variable name="date" select="xs:date('2003-04-01')" as="xs:date" />
<xsl:variable name="date" as="xs:date">2003-04-01</xsl:variable>
The value of the text node is untyped and can, therefore, be cast to a date implicitly.
If you don’t specify an as attribute then no conversion takes place. In the following cases, the date variable is still recognisable as an xs:date because the type of the selected value is an xs:date:
<xsl:variable name="date" select="current-date()" /> <xsl:variable name="date" select="xs:date('2003-04-01')" />
To set the date attribute to the xs:date value of an untyped date attribute, the following are equivalent:
<xsl:variable name="date" select="@date" as="xs:date" /> <xsl:variable name="date" select="xs:date(@date)" />
So in fact, when setting variables or passing values to parameters with <xsl:with-param>, there is no need to use the as attribute unless the variable or parameter’s value is being set using the content of the variable-binding element, although doing so might help an XSLT processor to perform static analysis of the stylesheet.
Specifying the type of parameters when they are declared is another matter, however. On the <xsl:param> element, the as attribute specifies the required type of the value that’s specified for a parameter. If the wrong type of value is passed for the parameter, then a type error will be raised. Since the declaration of the parameter and the setting of its value occur in different places, this is a good thing — a place where type checking can help ensure that the functions and named templates that you define are being called with the correct types of arguments. What’s more, declaring the type of the parameters simplifies the code within the function definition or template because you won’t need to use explicit casting to use the parameter value (assuming, that is, you declare it with the correct type).
On the other hand, you need to be careful about the types that you use for parameters in order to make the function or template easy for people to call. For example, if you were writing a function with an argument that has to be a positive integer, you might be tempted to declare the type of that parameter as xs:positiveInteger. If you did so, however, those using your function (including yourself!) would have to use explicit casts to create values of the correct type. The general rule of being generous in what you accept and strict in what you produce apply here: using general types, such as xs:string and xs:double, for arguments and specific types, such as xs:language and xs:positiveInteger, for return values will make your functions and templates a lot easier to call and use.
This paper has reviewed the type-related changes that XPath 2.0 brings and their impact on XSLT authors. There are three type-related changes in XPath 2.0:
The complexity of the type system is largely a consequence of XML Schema’s datatype specification, with a few extra types (notably, xdt:yearMonthDuration and xdt:dayTimeDuration) added to adapt it for computation rather than validation. However, even with these changes, the relationships of the types within the XML Schema type hierarchy is not well-suited for the strong typing rules that XPath 2.0 employs. This is evident in several places where the general rules of the XPath 2.0 type system are broken to make life easier for users, for example:
As we’ve seen, explicit casting makes life harder for the XSLT author. When working with typed data, or with data-oriented XML in which values fit nicely onto the type system, the implicit casting rules work reasonably well. However, there are four not uncommon situations where XSLT authors working with untyped data will find themselves having to explicitly cast values:
These casts add to the length and complexity of the code and require the XSLT author to know, for example, which functions require xs:integer arguments and which, xs:double arguments.
One argument for the strong typing introduced to XPath 2.0 is that it should help the processor identify errors at analysis time and, thus, help the author create code that does the right thing. The kinds of errors that users are interested in are those where they have made mistakes in their algorithm or function calls. But XPath 2.0 processors will raise errors when there is nothing wrong with the algorithm used. An example we looked at earlier was:
((substring(@Period, 6, 2) - 1) idiv 3) * 3 + 1
There is nothing wrong with the algorithm being used here. When the XSLT 2.0 processor raises a type error because a xs:string can’t be an operand of a subtraction, it’s more likely to be an irritation to the XSLT author than an assistance. Even if an XSLT 2.0 processor doesn’t implement the static typing feature of XPath 2.0, and can recognize dynamically that the result of the substring() function can be cast to an xs:integer, it still has to raise the type error. And even when the cast is added:
((xs:integer(substring(@Period, 6, 2)) - 1) idiv 3) * 3 + 1
if (substring(@Period, 6, 2) castable as xs:integer) then ((xs:integer(substring(@Period, 6, 2)) - 1) idiv 3) * 3 + 1 else xs:double('NaN')
The requirements for explicit casting in XPath 2.0 can be ameliorated by taking two steps.
First, when designing markup languages, the values of nodes should have a format that corresponds to one of the well-supported XPath 2.0 types. If the Period attribute in the example above was in the format of an xs:date rather than an xs:gYearMonth, then the code:
((get-month-from-date(@Period) - 1) idiv 3) * 3 + 1
Second, XML documents should be validated so that nodes are annotated with their type. If the width and height attributes are annotated as having xs:integer values, for example, then the comparison:
@width > @height
But these are steps that few XSLT authors will be able to take. XSLT authors generally have no control over the format of the XML that they deal with, which means that string parsing of element and attribute values is common place. Getting typed data is even rarer, especially given that there is no requirement for XSLT 2.0 processors to support the annotation of nodes, even with just the built-in types.
To conclude, the strong typing of XPath 2.0 offers few benefits, and a fair burden, to those XSLT authors working with untyped data.
The additional date/time and duration types make date/time arithmetic a lot easier to carry out, and the xs:QName type might also prove useful in some transformations, but as a whole the type hierarchy contains a great many more types than are required or actively supported with XPath 2.0 functions. The size of the type hierarchy is daunting for new users and creates a barrier to learning XPath 2.0.
The greatest burden on XSLT authors, especially those used to the weak typing of XPath 1.0, comes in the form of the requirement for explicit casting. The implicit casting of untyped nodes to a required type is a great help but does not address common scenarios in which explicit casting is required. In particular, there is no implicit casting of strings (which are often generated following string manipulation of a node) to a required type or of values to strings. And when untyped values are used with arithmetic and comparison operators, the results are sometimes not what an author might expect.
It remains to be seen whether these features of XPath 2.0 prove to limit its uptake amongst authors dealing with untyped data, or are simply irritations that XSLT 2.0 authors will learn to live with.
The only kind of variables you can declare in XPath 2.0 are range variables — those used in for, some, and every expressions. Although all variables in XPath 2.0 have a type, you can’t declare the type of a range variable; it is guaranteed to contain a single item only, so its type is no more specific than item().
If a document is validated against a DTD, then attributes will be assigned a type based on that declared in the DTD.
The namespace for the XPath datatypes won’t be finalized until the Recommendation is issued.
Where XPath 2.0 functions, such as sum() or id() need to treat different argument types in different ways, they’re defined to accept the general type item(), which covers both nodes and atomic values, or xdt:anyAtomicType, which covers all atomic types, and the rejection of those types that aren’t allowed is detailed in the natural-language definition of the function rather than its signature.