SYNOPSIS

 use HTML::HTML5::Writer;

 my $writer = HTML::HTML5::Writer->new;
 print $writer->document($dom);

DESCRIPTION

This module outputs XML::LibXML::Node objects as \s-1HTML5\s0 strings. It works well on \s-1DOM\s0 trees that represent valid \s-1HTML/XHTML\s0 documents; less well on other \s-1DOM\s0 trees.

Constructor

Create a new writer object. Options include:

  • markup Choose which serialisation of \s-1HTML5\s0 to use: 'html' or 'xhtml'.

  • polyglot Set to true in order to attempt to produce output which works as both \s-1XML\s0 and \s-1HTML\s0. Set to false to produce content that might not. If you don't explicitly set it, then it defaults to false for \s-1HTML\s0, and true for \s-1XHTML\s0.

  • doctype Set this to a string to choose which <!DOCTYPE> tag to output. Note, this purely sets the <!DOCTYPE> tag and does not change how the rest of the document is output. This really is just a plain string literal... # Yes, this works... my $w = HTML::HTML5::Writer->new(doctype => '<!doctype html>'); The following constants are provided for convenience: \s-1DOCTYPE_HTML2\s0, \s-1DOCTYPE_HTML32\s0, \s-1DOCTYPE_HTML4\s0 (latest stable strict \s-1HTML\s0 4.x), \s-1DOCTYPE_HTML4_RDFA\s0 (latest stable \s-1HTML\s0 4.x+RDFa), \s-1DOCTYPE_HTML40\s0 (strict), \s-1DOCTYPE_HTML40_FRAMESET\s0, \s-1DOCTYPE_HTML40_LOOSE\s0, \s-1DOCTYPE_HTML40_STRICT\s0, \s-1DOCTYPE_HTML401\s0 (strict), \s-1DOCTYPE_HTML401_FRAMESET\s0, \s-1DOCTYPE_HTML401_LOOSE\s0, \s-1DOCTYPE_HTML401_RDFA10\s0, \s-1DOCTYPE_HTML401_RDFA11\s0, \s-1DOCTYPE_HTML401_STRICT\s0, \s-1DOCTYPE_HTML5\s0, \s-1DOCTYPE_LEGACY\s0 (about:legacy-compat), \s-1DOCTYPE_NIL\s0 (empty string), \s-1DOCTYPE_XHTML1\s0 (strict), \s-1DOCTYPE_XHTML1_FRAMESET\s0, \s-1DOCTYPE_XHTML1_LOOSE\s0, \s-1DOCTYPE_XHTML1_STRICT\s0, \s-1DOCTYPE_XHTML11\s0, \s-1DOCTYPE_XHTML_BASIC\s0, \s-1DOCTYPE_XHTML_BASIC_10\s0, \s-1DOCTYPE_XHTML_BASIC_11\s0, \s-1DOCTYPE_XHTML_MATHML_SVG\s0, \s-1DOCTYPE_XHTML_RDFA\s0 (latest stable strict XHTML+RDFa), \s-1DOCTYPE_XHTML_RDFA10\s0, \s-1DOCTYPE_XHTML_RDFA11\s0. Defaults to \s-1DOCTYPE_HTML5\s0 for \s-1HTML\s0 and \s-1DOCTYPE_LEGACY\s0 for \s-1XHTML\s0.

  • charset This module always returns strings in Perl's internal utf8 encoding, but you can set the 'charset' option to 'ascii' to create output that would be suitable for re-encoding to \s-1ASCII\s0 (e.g. it will entity-encode characters which do not exist in \s-1ASCII\s0).

  • quote_attributes Set this to a true to force attributes to be quoted. If not explicitly set, the writer will automatically detect when attributes need quoting.

  • voids Set this to true to force void elements to always be terminated with '/>'. If not explicitly set, they'll only be terminated that way in polyglot or \s-1XHTML\s0 documents.

  • start_tags and end_tags Except in polyglot and \s-1XHTML\s0 documents, some elements allow their start and/or end tags to be omitted in certain circumstances. By setting these to true, you can prevent them from being omitted.

  • refs Special characters that can't be encoded as named entities need to be encoded as numeric character references instead. These can be expressed in decimal or hexadecimal. Setting this option to 'dec' or 'hex' allows you to choose. The default is 'hex'.

Public Methods

Outputs (i.e. returns a string that is) an XML::LibXML::Document as \s-1HTML\s0. Outputs an XML::LibXML::Element as \s-1HTML\s0. Outputs an XML::LibXML::Attr as \s-1HTML\s0. Outputs an XML::LibXML::Text as \s-1HTML\s0. Outputs an XML::LibXML::CDATASection as \s-1HTML\s0. Outputs an XML::LibXML::Comment as \s-1HTML\s0. Outputs an XML::LibXML::PI as \s-1HTML\s0. Outputs the writer's \s-1DOCTYPE\s0. Takes a string and returns the same string with some special characters replaced. These special characters do not include any of '&', '<', '>' or '"', but you can provide a string of additional characters to treat as special: $encoded = $writer->encode_entities($raw, characters=>'&<>"'); Returns $char entity-encoded. Encoding is done regardless of whether $char is \*(L"special\*(R" or not. Boolean indicating if $writer is configured to output \s-1XHTML\s0. Boolean indicating if $writer is configured to output polyglot \s-1HTML\s0. Booleans indicating whether optional start and end tags should be forced. Boolean indicating whether attributes need to be quoted. Boolean indicating whether void elements should be closed in the \s-1XHTML\s0 style.

BUGS AND LIMITATIONS

Certain \s-1DOM\s0 constructs cannot be output in non-XML \s-1HTML\s0. e.g.

my $xhtml = <<XHTML; <html xmlns="http://www.w3.org/1999/xhtml"> <head><title>Test</title></head> <body><hr>This text is within the HR element</hr></body> </html> XHTML my $dom = XML::LibXML->new->parse_string($xhtml); my $writer = HTML::HTML5::Writer->new(markup=>'html'); print $writer->document($dom);

In \s-1HTML\s0, there's no way to serialise that properly in \s-1HTML\s0. Right now this module just outputs that \s-1HR\s0 element with text contained within it, a la \s-1XHTML\s0. In future versions, it may emit a warning or throw an error.

In these cases, the HTML::HTML5::{Parser,Writer} combination is not round-trippable.

Outputting elements and attributes in foreign (non-XHTML) namespaces is implemented pretty naively and not thoroughly tested. I'd be interested in any feedback people have, especially on round-trippability of \s-1SVG\s0, MathML and RDFa content in \s-1HTML\s0.

Please report any bugs to <http://rt.cpan.org/>.

RELATED TO HTML::HTML5::Writer…

HTML::HTML5::Parser, HTML::HTML5::Builder, HTML::HTML5::ToText, XML::LibXML.

AUTHOR

Toby Inkster <[email protected]>.

COPYRIGHT AND LICENSE

Copyright (C) 2010-2012 by Toby Inkster.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.