XML::XQL: A perl module for querying xml tree structures with xql

SYNOPSIS

 use XML::XQL;
 use XML::XQL::DOM;

 $parser = new XML::DOM::Parser;
 $doc = $parser->parsefile ("file.xml");

 # Return all elements with tagName='title' under the root element 'book'
 $query = new XML::XQL::Query (Expr => "book/title");
 @result = $query->solve ($doc);
 $query->dispose; # Avoid memory leaks - Remove circular references

 # Or (to save some typing)
 @result = XML::XQL::solve ("book/title", $doc);

 # Or (to save even more typing)
 @result = $doc->xql ("book/title");

DESCRIPTION

The \s-1XML::XQL\s0 module implements the \s-1XQL\s0 (\s-1XML\s0 Query Language) proposal submitted to the \s-1XSL\s0 Working Group in September 1998. The spec can be found at: <http://www.w3.org/TandS/QL/QL98/pp/xql.html> Most of the contents related to the \s-1XQL\s0 syntax can also be found in the XML::XQL::Tutorial that comes with this distribution. Note that \s-1XQL\s0 is not the same as XML-QL!

The current implementation only works with the \s-1XML::DOM\s0 module, but once the design is stable and the major bugs are flushed out, other extensions might follow, e.g. for XML::Grove.

\s-1XQL\s0 was designed to be extensible and this implementation tries to stick to that. Users can add their own functions, methods, comparison operators and data types. Plugging in a new \s-1XML\s0 tree structure (like XML::Grove) should be a piece of cake.

To use the \s-1XQL\s0 module, either

use XML::XQL;

use XML::XQL::Strict;

The Strict module only provides the core \s-1XQL\s0 functionality as found in the \s-1XQL\s0 spec. By default (i.e. by using \s-1XML::XQL\s0) you get '\s-1XQL+\s0', which has some additional features.

See the section \*(L"Additional Features in \s-1XQL+\s0\*(R" for the differences.

This module is still in development. See the To-do list in \s-1XQL\s0.pm for what still needs to be done. Any suggestions are welcome, the sooner these implementation issues are resolved, the faster we can all use this module.

If you find a bug, you would do me great favor by sending it to me in the form of a test case. See the file t/xql_template.t that comes with this distribution.

If you have written a cool comparison operator, function, method or \s-1XQL\s0 data type that you would like to share, send it to [email protected] and I will add it to this module.

XML::XQL global functions

solve (\s-1QUERY_STRING\s0, \s-1INPUT_LIST\s0...): @result = XML::XQL::solve ("doc//book", $doc); This is provided as a shortcut for: $query = new XML::XQL::Query (Expr => "doc//book"); @result = $query->solve ($doc); $query->dispose; Note that with \s-1XML::XQL::DOM\s0, you can also write (see XML::DOM::Node for details): @result = $doc->xql ("doc//book");
setDocParser (\s-1PARSER\s0): Sets the XML::DOM::Parser that is used by the new \s-1XQL+\s0 document() method. By default it uses an XML::DOM::Parser that was created without any arguments, i.e. $PARSER = new XML::DOM::Parser;
defineFunction (\s-1NAME\s0, \s-1FUNCREF\s0, \s-1ARGCOUNT\s0 [, \s-1ALLOWED_OUTSIDE\s0 [, \s-1CONST\s0, [\s-1QUERY_ARG\s0]]]): Defines the \s-1XQL\s0 function (at the global level, i.e. for all newly created queries) with the specified \s-1NAME\s0. The \s-1ARGCOUNT\s0 parameter can either be a single number or a reference to a list with numbers. A single number expands to [\s-1ARGCOUNT\s0, \s-1ARGCOUNT\s0]. The list contains pairs of numbers, indicating the number of arguments that the function allows. The value -1 means infinity. E.g. [2, 5, 7, 9, 12, -1] means that the function can have 2, 3, 4, 5, 7, 8, 9, 12 or more arguments. The number of arguments is checked when parsing the \s-1XQL\s0 query string. The second parameter must be a reference to a Perl function or an anonymous sub. E.g. '\&my_func' or 'sub { ... code ... }' If \s-1ALLOWED_OUTSIDE\s0 (default is 0) is set to 1, the function or method may also be used outside subqueries in node queries. (See NodeQuery parameter in Query constructor) If \s-1CONST\s0 (default is 0) is set to 1, the function is considered to be \*(L"constant\*(R". See \*(L"Constant Function Invocations\*(R" for details. If \s-1QUERY_ARG\s0 (default is 0) is not -1, the argument with that index is considered to be a 'query parameter'. If the query parameter is a subquery, that returns multiple values, the result list of the function invocation will contain one result value for each value of the subquery. E.g. 'length(book/author)' will return a list of Numbers, denoting the string lengths of all the author elements returned by 'book/author'. Note that only methods (not functions) may appear after a Bang \*(L"!\*(R" operator. This is checked when parsing the \s-1XQL\s0 query string. See also: defineMethod
generateFunction (\s-1NAME\s0, \s-1FUNCNAME\s0, \s-1RETURN_TYPE\s0 [, \s-1ARGCOUNT\s0 [, \s-1ALLOWED_OUTSIDE\s0 [, \s-1CONST\s0 [, \s-1QUERY_ARG\s0]]]]): Generates and defines an \s-1XQL\s0 function wrapper for the Perl function with the name \s-1FUNCNAME\s0. The function name will be \s-1NAME\s0 in \s-1XQL\s0 query expressions. The return type should be one of the builtin \s-1XQL\s0 Data Types or a class derived from XML::XQL::PrimitiveType (see \*(L"Adding Data Types\*(R".) See defineFunction for the meaning of \s-1ARGCOUNT\s0, \s-1ALLOWED_OUTSIDE\s0, \s-1CONST\s0 and \s-1QUERY_ARG\s0. Function values are always converted to Perl strings with xql_toString before they are passed to the Perl function implementation. The function return value is cast to an object of type \s-1RETURN_TYPE\s0, or to the empty list [] if the result is undef. It uses expandType to expand \s-1XQL\s0 primitive type names. If \s-1RETURN_TYPE\s0 is \*(L"*\*(R", it returns the function result as is, unless the function result is undef, in which case it returns [].
defineMethod (\s-1NAME\s0, \s-1FUNCREF\s0, \s-1ARGCOUNT\s0 [, \s-1ALLOWED_OUTSIDE\s0]): Defines the \s-1XQL\s0 method (at the global level, i.e. for all newly created queries) with the specified \s-1NAME\s0. The \s-1ARGCOUNT\s0 parameter can either be a single number or a reference to a list with numbers. A single number expands to [\s-1ARGCOUNT\s0, \s-1ARGCOUNT\s0]. The list contains pairs of numbers, indicating the number of arguments that the method allows. The value -1 means infinity. E.g. [2, 5, 7, 9, 12, -1] means that the method can have 2, 3, 4, 5, 7, 8, 9, 12 or more arguments. The number of arguments is checked when parsing the \s-1XQL\s0 query string. The second parameter must be a reference to a Perl function or an anonymous sub. E.g. '\&my_func' or 'sub { ... code ... }' If \s-1ALLOWED_OUTSIDE\s0 (default is 0) is set to 1, the function or method may also be used outside subqueries in node queries. (See NodeQuery parameter in Query constructor) Note that only methods (not functions) may appear after a Bang \*(L"!\*(R" operator. This is checked when parsing the \s-1XQL\s0 query string. See also: defineFunction
defineComparisonOperators (\s-1NAME\s0 => \s-1FUNCREF\s0 [, \s-1NAME\s0 => \s-1FUNCREF\s0]*): Defines \s-1XQL\s0 comparison operators at the global level. The \s-1FUNCREF\s0 parameters must be a references to a Perl function or an anonymous sub. E.g. '\&my_func' or 'sub { ... code ... }' E.g. define the operators $my_op$ and $my_op2$: defineComparisonOperators ('my_op' => \&my_op, 'my_op2' => sub { ... insert code here ... });
defineElementValueConvertor (\s-1TAG_NAME\s0, \s-1FUNCREF\s0): Defines that the result of the value() call for Elements with the specified \s-1TAG_NAME\s0 uses the specified function. The function will receive two parameters. The second one is the \s-1TAG_NAME\s0 of the Element node and the first parameter is the Element node itself. \s-1FUNCREF\s0 should be a reference to a Perl function, e.g. \&my_sub, or an anonymous sub. E.g. to define that all Elements with tag name 'date-of-birth' should return XML::XQL::Date objects: defineElementValueConvertor ('date-of-birth', sub { my $elem = shift; # Always pass in the node as the second parameter. This is # the reference node for the object, which is used when # sorting values in document order. new XML::XQL::Date ($elem->xql_text, $elem); }); These converters can only be specified at a global level, not on a per query basis. To undefine a converter, simply pass a \s-1FUNCREF\s0 of undef.
defineAttrValueConvertor (\s-1ELEM_TAG_NAME\s0, \s-1ATTR_NAME\s0, \s-1FUNCREF\s0): Defines that the result of the value() call for Attributes with the specified \s-1ATTR_NAME\s0 and a parent Element with the specified \s-1ELEM_TAG_NAME\s0 uses the specified function. An \s-1ELEM_TAG_NAME\s0 of \*(L"*\*(R" will match regardless of the tag name of the parent Element. The function will receive 3 parameters. The third one is the tag name of the parent Element (even if \s-1ELEM_TAG_NAME\s0 was \*(L"*\*(R"), the second is the \s-1ATTR_NAME\s0 and the first is the Attribute node itself. \s-1FUNCREF\s0 should be a reference to a Perl function, e.g. \&my_sub, or an anonymous sub. These converters can only be specified at a global level, not on a per query basis. To undefine a converter, simply pass a \s-1FUNCREF\s0 of undef.
defineTokenQ (Q): Defines the token for the q// string delimiters at a global level. The default value for \s-1XQL+\s0 is 'q', for XML::XQL::Strict it is undef. A value of undef will deactivate this feature.
defineTokenQQ (\s-1QQ\s0): Defines the token for the qq// string delimiters at a global level. The default value for \s-1XQL+\s0 is 'qq', for XML::XQL::Strict it is undef. A value of undef will deactivate this feature.
expandType (\s-1TYPE\s0): Used internally to expand type names of \s-1XQL\s0 primitive types. E.g. it expands \*(L"Number\*(R" to \*(L"XML::XQL::Number\*(R" and is not case-sensitive, so \*(L"number\*(R" and \*(L"NuMbEr\*(R" will both expand correctly.
defineExpandedTypes (\s-1ALIAS\s0, \s-1FULL_NAME\s0 [, ...]): For each pair of arguments it allows the class name \s-1FULL_NAME\s0 to be abbreviated with \s-1ALIAS\s0. The definitions are used by expandType(). (\s-1ALIAS\s0 is always converted to lowercase internally, because expandType is case-insensitive.) Overriding the \s-1ALIAS\s0 for \*(L"date\*(R", also affects the object type returned by the date() function.
setErrorContextDelimiters (\s-1START\s0, \s-1END\s0, \s-1BOLD_ON\s0, \s-1BOLD_OFF\s0): Sets the delimiters used when printing error messages during query evaluation. The default delimiters on Unix are `tput smul` (underline on) and `tput rmal` (underline off). On other systems (that don't have tput), the delimiters are \*(L">>\*(R" and \*(L"<<\*(R" resp. When printing the error message, the subexpression that caused the error will be enclosed by the delimiters, i.e. underlined on Unix. For certain subexpressions the significant keyword, e.g. \*(L"$and$\*(R" is enclosed in the bold delimiters \s-1BOLD_ON\s0 (default: `tput bold` on Unix, "\*(L" elsewhere) and \s-1BOLD_OFF\s0 (default: (`tput rmul` . `tput smul`) on Unix, \*(R"" elsewhere, see $BoldOff in \s-1XML::XQL::XQL\s0.pm for details.)
isEmptyList (\s-1VAR\s0): Returns 1 if \s-1VAR\s0 is [], else 0. Can be used in user defined functions.

Additional Features in XQL+

Parent operator '..'

The '..' operator returns the parent of the current node, where '.' would return the current node. This is not part of any \s-1XQL\s0 standard, because you would normally use return operators, which are not implemented here.

Sequence operators ';' and ';;'

The sequence operators ';' (precedes) and ';;' (immediately precedes) are not in the \s-1XQL\s0 spec, but are described in 'The Design of \s-1XQL\s0' by Jonathan Robie who is one of the designers of \s-1XQL\s0. It can be found at <http://www.texcel.no/whitepapers/xql-design.html> See also the \s-1XQL\s0 Tutorial for a description of what they mean.

q// and qq// String Tokens

String tokens a la q// and qq// are allowed. q// evaluates like Perl's single quotes and qq// like Perl's double quotes. Note that the default \s-1XQL\s0 strings do not allow escaping etc., so it's not possible to define a string with both single and double quotes. If 'q' and 'qq' are not to your liking, you may redefine them to something else or undefine them altogether, by assigning undef to them. E.g: # at a global level - shared by all queries (that don't (re)define 'q') XML::XQL::defineTokenQ ('k'); XML::XQL::defineTokenQQ (undef);

# at a query level - only defined for this query $query = new XML::XQL::Query (Expr => "book/title", q => 'k', qq => undef); From now on k// works like q// did and qq// doesn't work at all anymore.

Query strings can have embedded Comments

For example: $queryExpr = "book/title # this comment is inside the query string [. = 'Moby Dick']"; # this comment is outside

Optional dollar delimiters and case-insensitive \s-1XQL\s0 keywords

The following \s-1XQL\s0 keywords are case-insensitive and the dollar sign delimiters may be omitted: $and$, $or$, $not$, $union$, $intersect$, $to$, $any$, $all$, $eq$, $ne$, $lt$, $gt$, $ge$, $le$, $ieq$, $ine$, $ilt$, $igt$, $ige$, $ile$. E.g. $AND$, $And$, $aNd$, and, And, aNd are all valid replacements for $and$. Note that \s-1XQL+\s0 comparison operators ($match$, $no_match$, $isa$, $can$) still require dollar delimiters and are case-sensitive. E.g. \*(L"book/title =~ '/(Moby|Dick)/']\*(R" will return all book titles containing Moby or Dick. Note that the match expression needs to be quoted and should contain the // or m// delimiters for Perl. When casting the values to be matched, both are converted to Text. E.g. \*(L"book/title !~ '/(Moby|Dick)/']\*(R" will return all book titles that don't contain Moby or Dick. Note that the match expression needs to be quoted and should contain the // or m// delimiters for Perl. When casting the values to be matched, both are converted to Text. E.g. '//. $isa$ \*(L"XML::XQL::Date\*(R"' returns all elements for which the value() function returns an XML::XQL::Date object. (Note that the value() function can be overridden to return a specific object type for certain elements and attributes.) It uses expandType to expand \s-1XQL\s0 primitive type names. E.g. '//. $can$ \*(L"swim\*(R"' returns all elements for which the value() function returns an object that implements the (Perl) swim() method. (Note that the value() function can be overridden to return a specific object type for certain elements and attributes.)

Function: once (\s-1QUERY\s0)

E.g. 'once(id(\*(L"foo\*(R"))' will evaluate the \s-1QUERY\s0 expression only once per query. Certain query results (like the above example) will always return the same value within a query. Using once() will cache the \s-1QUERY\s0 result for the rest of the query. Note that \*(L"constant\*(R" function invocations are always cached. See also \*(L"Constant Function Invocations\*(R"

Function: subst (\s-1QUERY\s0, \s-1EXPR\s0, \s-1EXPR\s0 [,MODIFIERS, [\s-1MODE\s0]])

E.g. 'subst(book/title, \*(L"[M|m]oby\*(R", \*(L"Dick\*(R", \*(L"g\*(R")' will replace Moby or moby with Dick globally (\*(L"g\*(R") in all book title elements. Underneath it uses Perl's substitute operator s///. Don't worry about which delimiters are used underneath. The function returns all the book/titles for which a substitution occurred. The default \s-1MODIFIERS\s0 string is "\*(L" (empty.) The function name may be abbreviated to \*(R"s". For most Node types, it converts the value() to a string (with xql_toString) to match the string and xql_setValue to set the new value in case it matched. For \s-1XQL\s0 primitives (Boolean, Number, Text) and other data types (e.g. Date) it uses xql_toString to match the String and xql_setValue to set the result. Beware that performing a substitution on a primitive that was found in the original \s-1XQL\s0 query expression, changes the value of that constant. If \s-1MODE\s0 is 0 (default), it treats Element nodes differently by matching and replacing text blocks occurring in the Element node. A text block is defined as the concatenation of the raw text of subsequent Text, CDATASection and EntityReference nodes. In this mode it skips embedded Element nodes. If a text block matches, it is replaced by a single Text node, regardless of the original node type(s). If \s-1MODE\s0 is 1, it treats Element nodes like the other nodes, i.e. it converts the value() to a string etc. Note that the default implementation of value() calls text(), which normalizes whitespace and includes embedded Element descendants (recursively.) This is probably not what you want to use in most cases, but since I'm not a professional psychic... :-)

Function: map (\s-1QUERY\s0, \s-1CODE\s0)

E.g. 'map(book/title, \*(L"s/[M|m]oby/Dick/g; $_\*(R")' will replace Moby or moby with Dick globally (\*(L"g\*(R") in all book title elements. Underneath it uses Perl's map operator. The function returns all the book/titles for which a change occurred. ??? add more specifics

Function: eval (\s-1EXPR\s0 [,TYPE])

Evaluates the Perl expression \s-1EXPR\s0 and returns an object of the specified \s-1TYPE\s0. It uses expandType to expand \s-1XQL\s0 primitive type names. If the result of the eval was undef, the empty list [] is returned. E.g. 'eval(\*(L"2 + 5\*(R", \*(L"Number\*(R")' returns a Number object with the value 7, and

     'eval(\*(L"%ENV{\s-1USER\s0}\*(R")' returns a Text object with the user name.

Consider using once() to cache the return value, when the invocation will return the same result for each invocation within a query. ??? add more specifics

Function: new (\s-1TYPE\s0 [, \s-1QUERY\s0 [, \s-1PAR\s0] *])

Creates a new object of the specified object \s-1TYPE\s0. The constructor may have any number of arguments. The first argument of the constructor (the 2nd argument of the new() function) is considered to be a 'query parameter'. See defineFunction for a definition of query parameter. It uses expandType to expand \s-1XQL\s0 primitive type names.

Function: document (\s-1QUERY\s0) or doc (\s-1QUERY\s0)

The document() function creates a new XML::XML::Document for each result of \s-1QUERY\s0 (\s-1QUERY\s0 may be a simple string expression, like \*(L"/usr/enno/file.xml\*(R". See t/xql_document.t or below for an example with a more complex \s-1QUERY\s0.) document() may be abbreviated to doc(). document() uses an XML::DOM::Parser underneath, which can be set with XML::XQL::setDocParser(). By default it uses a parser that was created without any arguments, i.e. $PARSER = new XML::DOM::Parser; Let's try a more complex example, assuming $doc contains: <doc> <file name="file1.xml"/> <file name="file2.xml"/> </doc> Then the following query will return two XML::XML::Documents, one for file1.xml and one for file2.xml: @result = XML::XQL::solve ("document(doc/file/@name)", $doc); The resulting documents can be used as input for following queries, e.g. @result = XML::XQL::solve ("document(doc/file/@name)/root/bla", $doc); will return all /root/bla elements from the documents returned by document().

Method: DOM_nodeType ()

Returns the \s-1DOM\s0 node type. Note that these are mostly the same as nodeType(), except for CDATASection and EntityReference nodes. DOM_nodeType() returns 4 and 5 respectively, whereas nodeType() returns 3, because they are considered text nodes.

Function wrappers for Perl builtin functions

\s-1XQL\s0 function wrappers have been provided for most Perl builtin functions. When using a Perl builtin function like \*(L"substr\*(R" in an \s-1XQL+\s0 querry, an \s-1XQL\s0 function wrapper will be generated on the fly. The arguments to these functions may be regular \s-1XQL+\s0 subqueries (that return one or more values) for a query parameter (see generateFunction for a definition.) Most wrappers of Perl builtin functions have argument 0 for a query parameter, except for: chmod (parameter 1 is the query parameter), chown (2) and utime (2). The following functions have no query parameter, which means that all parameters should be a single value: atan2, rand, srand, sprintf, rename, unlink, system. The function result is casted to the appropriate \s-1XQL\s0 primitive type (Number, Text or Boolean), or to an empty list if the result was undef.

XPath functions and methods

The following functions were found in the XPath specification:

Function: concat (\s-1STRING\s0, \s-1STRING\s0, STRING*): The concat function returns the concatenation of its arguments.
Function: starts-with (\s-1STRING\s0, \s-1STRING\s0): The starts-with function returns true if the first argument string starts with the second argument string, and otherwise returns false.
Function: contains (\s-1STRING\s0, \s-1STRING\s0): The contains function returns true if the first argument string contains the second argument string, and otherwise returns false.
Function: substring-before (\s-1STRING\s0, \s-1STRING\s0): The substring-before function returns the substring of the first argument string that precedes the first occurrence of the second argument string in the first argument string, or the empty string if the first argument string does not contain the second argument string. For example, substring-before("1999/04/01","/") returns 1999.
Function: substring-after (\s-1STRING\s0, \s-1STRING\s0): The substring-after function returns the substring of the first argument string that follows the first occurrence of the second argument string in the first argument string, or the empty string if the first argument string does not contain the second argument string. For example, substring-after("1999/04/01","/") returns 04/01, and substring-after("1999/04/01","19") returns 99/04/01.
Function: substring (\s-1STRING\s0, \s-1NUMBER\s0 [, \s-1NUMBER\s0] ): The substring function returns the substring of the first argument starting at the position specified in the second argument with length specified in the third argument. For example, substring("12345",2,3) returns "234". If the third argument is not specified, it returns the substring starting at the position specified in the second argument and continuing to the end of the string. For example, substring("12345",2) returns "2345". More precisely, each character in the string is considered to have a numeric position: the position of the first character is 1, the position of the second character is 2 and so on. \s-1NOTE:\s0 This differs from the substr method , in which the method treats the position of the first character as 0. The XPath spec says this about rounding, but that is not true in this implementation: The returned substring contains those characters for which the position of the character is greater than or equal to the rounded value of the second argument and, if the third argument is specified, less than the sum of the rounded value of the second argument and the rounded value of the third argument; the comparisons and addition used for the above follow the standard \s-1IEEE\s0 754 rules; rounding is done as if by a call to the round function.
Method: string-length ( [ \s-1QUERY\s0 ] ): The string-length returns the number of characters in the string. If the argument is omitted, it defaults to the context node converted to a string, in other words the string-value of the context node. Note that the generated \s-1XQL\s0 wrapper for the Perl built-in substr does not allow the argument to be omitted.
Method: normalize-space ( [ \s-1QUERY\s0 ] ): The normalize-space function returns the argument string with whitespace normalized by stripping leading and trailing whitespace and replacing sequences of whitespace characters by a single space. Whitespace characters are the same as those allowed by the S production in \s-1XML\s0. If the argument is omitted, it defaults to the context node converted to a string, in other words the string-value of the context node.
Function: translate (\s-1STRING\s0, \s-1STRING\s0, \s-1STRING\s0): The translate function returns the first argument string with occurrences of characters in the second argument string replaced by the character at the corresponding position in the third argument string. For example, translate("bar","abc","ABC") returns the string BAr. If there is a character in the second argument string with no character at a corresponding position in the third argument string (because the second argument string is longer than the third argument string), then occurrences of that character in the first argument string are removed. For example, translate("--aaa--","abc-","ABC") returns "AAA". If a character occurs more than once in the second argument string, then the first occurrence determines the replacement character. If the third argument string is longer than the second argument string, then excess characters are ignored. \s-1NOTE:\s0 The translate function is not a sufficient solution for case conversion in all languages. A future version may provide additional functions for case conversion. This function was implemented using tr///d.
Function: sum ( \s-1QUERY\s0 ): The sum function returns the sum of the \s-1QUERY\s0 results, by converting the string values of each result to a number.
Function: floor (\s-1NUMBER\s0): The floor function returns the largest (closest to positive infinity) number that is not greater than the argument and that is an integer.
Function: ceiling (\s-1NUMBER\s0): The ceiling function returns the smallest (closest to negative infinity) number that is not less than the argument and that is an integer.
Function: round (\s-1NUMBER\s0): The round function returns the number that is closest to the argument and that is an integer. If there are two such numbers, then the one that is closest to positive infinity is returned.

Implementation Details

\s-1XQL\s0 Builtin Data Types

The \s-1XQL\s0 engine uses the following object classes internally. Only Number, Boolean and Text are considered primitive \s-1XQL\s0 types:

XML::XQL::Number For integers and floating point numbers.
XML::XQL::Boolean For booleans, e.g returned by true() and false().
XML::XQL::Text For string values.
XML::XQL::Date For date, time and date/time values. E.g. returned by the date() function.
XML::XQL::Node Superclass of all \s-1XML\s0 node types. E.g. all subclasses of XML::DOM::Node subclass from this.
Perl list reference Lists of values are passed by reference (i.e. using [] delimiters). The empty list [] has a double meaning. It also means 'undef' in certain situations, e.g. when a function invocation or comparison failed.

Type casting in comparisons

When two values are compared in an \s-1XML\s0 comparison (e.g. $eq$) the values are first casted to the same data type. Node values are first replaced by their value() (i.e. the \s-1XQL\s0 value() function is used, which returns a Text value by default, but may return any data type if the user so chooses.) The resulting values are then casted to the type of the object with the highest xql_primType() value. They are as follows: Node (0), Text (1), Number (2), Boolean (3), Date (4), other data types (4 by default, but this may be overridden by the user.) E.g. if one value is a Text value and the other is a Number, the Text value is cast to a Number and the resulting low-level (Perl) comparison is (for $eq$): $number->xql_toString == $text->xql_toString If both were Text values, it would have been $text1->xql_toString eq $text2->xql_toString Note that the \s-1XQL\s0 spec is vague and even conflicting where it concerns type casting. This implementation resulted after talking to Joe Lapp, one of the spec writers.

Adding Data Types

If you want to add your own data type, make sure it derives from XML::XQL::PrimitiveType and implements the necessary methods. I will add more stuff here to explain it all, but for now, look at the code for the primitive \s-1XQL\s0 types or the Date class (XML::XQL::Date in Date.pm.)

Document Order

The \s-1XQL\s0 spec states that query results always return their values in document order, which means the order in which they appeared in the original \s-1XML\s0 document. Values extracted from Nodes (e.g. with value(), text(), rawText(), nodeName(), etc.) always have a pointer to the reference node (i.e. the Node from which the value was extracted.) These pointers are acknowledged when (intermediate) result lists are sorted. Currently, the only place where a result list is sorted is in a $union$ expression, which is the only place where the result list can be unordered. (If you find that this is not true, let me know.) Non-node values that have no associated reference node, always end up at the end of the result list in the order that they were added. The \s-1XQL\s0 spec states that the reference node for an \s-1XML\s0 Attribute is the Element to which it belongs, and that the order of values with the same reference node is undefined. This means that the order of an Element and its attributes would be undefined. But since the \s-1XML::DOM\s0 module keeps track of the order of the attributes, the \s-1XQL\s0 engine does the same, and therefore, the attributes of an Element are sorted and appear after their parent Element in a sorted result list.

Constant Function Invocations

If a function always returns the same value when given \*(L"constant\*(R" arguments, the function is considered to be \*(L"constant\*(R". A \*(L"constant\*(R" argument can be either an \s-1XQL\s0 primitive (Number, Boolean, Text) or a \*(L"constant\*(R" function invocation. E.g. date("12-03-1998") true() sin(0.3) length("abc") date(substr("12-03-1998 is the date", 0, 10)) are constant, but not: length(book[2]) Results of constant function invocations are cached and calculated only once for each query. See also the \s-1CONST\s0 parameter in defineFunction. It is not necessary to wrap constant function invocations in a once() call. Constant \s-1XQL\s0 functions are: date, true, false and a lot of the \s-1XQL+\s0 wrappers for Perl builtin functions. Function wrappers for certain builtins are not made constant on purpose to force the invocation to be evaluated every time, e.g. 'mkdir(\*(L"/user/enno/my_dir\*(R", \*(L"0644\*(R")' (although constant in appearance) may return different results for multiple invocations. See %PerlFunc in Plus.pm for details.

Function: count ([\s-1QUERY\s0])

The count() function has no parameters in the \s-1XQL\s0 spec. In this implementation it will return the number of \s-1QUERY\s0 results when passed a \s-1QUERY\s0 parameter.

Method: text ([\s-1RECURSE\s0])

When expanding an Element node, the text() method adds the expanded text() value of sub-Elements. When \s-1RECURSE\s0 is set to 0 (default is 1), it will not include sub-elements. This is useful e.g. when using the $match$ operator in a recursive context (using the // operator), so it won't return parent Elements when one of the children matches.

Method: rawText ([\s-1RECURSE\s0])

See text().

RELATED TO XML::XQL…

XML::XQL::Query, \s-1XML::XQL::DOM\s0, XML::XQL::Date

The Japanese version of this document can be found on-line at <http://member.nifty.ne.jp/hippo2000/perltips/xml/xql.htm>

The XML::XQL::Tutorial manual page. The Japanese version can be found at <http://member.nifty.ne.jp/hippo2000/perltips/xml/xql/tutorial.htm>

The \s-1XQL\s0 spec at <http://www.w3.org/TandS/QL/QL98/pp/xql.html>

The Design of \s-1XQL\s0 at <http://www.texcel.no/whitepapers/xql-design.html>

The \s-1DOM\s0 Level 1 specification at <http://www.w3.org/TR/REC-DOM-Level-1>

The \s-1XML\s0 spec (Extensible Markup Language 1.0) at <http://www.w3.org/TR/REC-xml>

The XML::Parser and XML::Parser::Expat manual pages.

AUTHOR

Enno Derksen is the original author.

Please send bugs, comments and suggestions to T.J. Mather <[email protected]>

XML::XQL (3pm)