SYNOPSIS

  use HTML::StripScripts::Parser();

  my $hss = HTML::StripScripts::Parser->new(

       {
           Context => 'Document',       ## HTML::StripScripts configuration
           Rules   => { ... },
       },

       strict_comment => 1,             ## HTML::Parser options
       strict_names   => 1,

  );

  $hss->parse_file("foo.html");

  print $hss->filtered_document;

  OR

  print $hss->filter_html($html);

DESCRIPTION

This class provides an easy interface to \*(C`HTML::StripScripts\*(C', using \*(C`HTML::Parser\*(C' to parse the \s-1HTML\s0.

See HTML::Parser for details of how to customise how the raw \s-1HTML\s0 is parsed into tags, and HTML::StripScripts for details of how to customise the way those tags are filtered.

CONSTRUCTORS

new ( {\s-1CONFIG\s0}, [\s-1PARSER_OPTIONS\s0] )

Creates a new \*(C`HTML::StripScripts::Parser\*(C' object. The \s-1CONFIG\s0 parameter has the same semantics as the \s-1CONFIG\s0 parameter to the \*(C`HTML::StripScripts\*(C' constructor. Any \s-1PARSER_OPTIONS\s0 supplied will be passed on to the HTML::Parser init method, allowing you to influence the way the input is parsed. You cannot use \s-1PARSER_OPTIONS\s0 to set the \*(C`HTML::Parser\*(C' event handlers (see \*(L"Events\*(R" in HTML::Parser) since \*(C`HTML::StripScripts::Parser\*(C' uses all of the event hooks itself. However, you can use \*(C`Rules\*(C' (see \*(L"Rules\*(R" in HTML::StripScripts) to customise the handling of all tags and attributes.

METHODS

See HTML::Parser for input methods, HTML::StripScripts for output methods. \*(C`filter_html()\*(C' is a convenience method for filtering \s-1HTML\s0 already loaded into a scalar variable. It combines calls to \*(C`HTML::Parser::parse()\*(C', \*(C`HTML::Parser::eof()\*(C' and \*(C`HTML::StripScripts::filtered_document()\*(C'.

$filtered_html = $hss->filter_html($html);

SUBCLASSING

The \*(C`HTML::StripScripts::Parser\*(C' class is subclassable. Filter objects are plain hashes. The hss_init() method takes the same arguments as new(), and calls the initialization methods of both \*(C`HTML::StripScripts\*(C' and \*(C`HTML::Parser\*(C'.

See \*(L"\s-1SUBCLASSING\s0\*(R" in HTML::StripScripts and \*(L"\s-1SUBCLASSING\s0\*(R" in HTML::Parser.

RELATED TO HTML::StripScripts::Parser…

HTML::StripScripts, HTML::Parser, HTML::StripScripts::LibXML

BUGS

None reported.

Please report any bugs or feature requests to [email protected], or through the web interface at <http://rt.cpan.org>.

AUTHOR

Original author Nick Cleaton <[email protected]>

New code added and module maintained by Clinton Gormley <[email protected]>

COPYRIGHT

Copyright (C) 2003 Nick Cleaton. All Rights Reserved.

Copyright (C) 2007 Clinton Gormley. All Rights Reserved.

LICENSE

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.