HTML::TreeBuilder::XPath: Add xpath support to html::treebuilder

METHODS

Extra methods added both to the tree object and to each element:

findnodes ($path)

Returns a list of nodes found by $path. In scalar context returns an \*(C`Tree::XPathEngine::NodeSet\*(C' object.

findnodes_as_string ($path)

Returns the text values of the nodes, as one string.

findnodes_as_strings ($path)

Returns a list of the values of the result nodes.

findvalue ($path)

Returns either a \*(C`Tree::XPathEngine::Literal\*(C', a \*(C`Tree::XPathEngine::Boolean\*(C' or a \*(C`Tree::XPathEngine::Number\*(C' object. If the path returns a NodeSet, $nodeset->xpath_to_literal is called automatically for you (and thus a \*(C`Tree::XPathEngine::Literal\*(C' is returned). Note that for each of the objects stringification is overloaded, so you can just print the value found, or manipulate it in the ways you would a normal perl value (e.g. using regular expressions).

findvalues ($path)

Returns the values of the matching nodes as a list. This is mostly the same as findnodes_as_strings, except that the elements of the list are objects (with overloaded stringification) instead of plain strings.

exists ($path)

Returns true if the given path exists.

matches($path)

Returns true if the element matches the path.

find ($path)

The find function takes an XPath expression (a string) and returns either a Tree::XPathEngine::NodeSet object containing the nodes it found (or empty if no nodes matched the path), or one of XML::XPathEngine::Literal (a string), XML::XPathEngine::Number, or XML::XPathEngine::Boolean. It should always return something - and you can use ->isa() to find out what it returned. If you need to check how many nodes it found you should check $nodeset->size. See XML::XPathEngine::NodeSet.

as_XML_compact

HTML::TreeBuilder's \*(C`as_XML\*(C' output is not really nice to look at, so I added a new method, that can be used as a simple replacement for it. It escapes only the '<', '>' and '&' (plus '"' in attribute values), and wraps \s-1CDATA\s0 elements in \s-1CDATA\s0 sections.

Note that the \s-1XML\s0 is actually not garanteed to be valid at this point. Nothing is done about the encoding of the string. Patches or just ideas of how it could work are welcome.

as_XML_indented

Same as as_XML, except that the output is indented.

HTML::TreeBuilder::XPath (3pm)

SYNOPSIS

DESCRIPTION