SYNOPSIS

  use MKDoc::XML::Stripper;

  my $stripper = new MKDoc::XML::Stripper;
  $stripper->allow (qw /p class id/);

  my $ugly = '<p class="para" style="color:red">Hello, <strong>World</strong>!</p>';
  my $neat = $stripper->process_data ($ugly);
  print $neat;

Should print:

<p class="para">Hello, World!</p>

SUMMARY

MKDoc::XML::Stripper is a class which lets you specify a set of tags and attributes which you want to allow, and then cheekily strip any \s-1XML\s0 of unwanted tags and attributes.

In MKDoc, this is used so that editors use structural \s-1XHTML\s0 rather than presentational tags, i.e. strip anything which looks like a <font> tag, a 'style' attribute or other tags which would break separation of structure from content.

DISCLAIMER

This module does low level \s-1XML\s0 manipulation. It will somehow parse even broken \s-1XML\s0 and try to do something with it. Do not use it unless you know what you're doing.

API

Instantiates a new MKDoc::XML::Stripper object. Loads a definition located somewhere in @INC under MKDoc/XML/Stripper.

Available definitions are:

xhtml10frameset
xhtml10strict
xhtml10transitional
mkdoc16 - MKDoc 1.6. \s-1XHTML\s0 structural markup

You can also load your own definition file, for instance:

$stripper->load_def ('my_def.txt');

Definitions are simple text files as follows:

# allow p with 'class' and id p class p id

# allow more stuff td class td id td style

# etc... Allows \*(L"<$tag>\*(R" to appear in the stripped \s-1XML\s0. Additionally, allows @attributes to appear as attributes of <$tag>, so for instance:

$stripper->allow ('p', 'class', 'id');

Will allow the following:

<p> <p class="foo"> <p id="bar"> <p class="foo" id="bar">

However any extra attributes will be stripped, i.e.

<p class="foo" id="bar" style="font-color: red">

Will be rewritten as

<p class="foo" id="bar"> Explicitly disallows a tag and all its associated attributes. By default everything is disallowed. Strips $some_xml according to the rules that were given with the allow() and disallow() methods and returns the result. Does not modify $some_xml in place. Strips '/an/xml/file.xml' according to the rules that were given with the allow() and disallow() methods and returns the result. Does not modify '/an/xml/file.xml' in place.

NOTES

MKDoc::XML::Stripper does not really parse the \s-1XML\s0 file you're giving to it nor does it care if the \s-1XML\s0 is well-formed or not. It uses MKDoc::XML::Tokenizer to turn the \s-1XML\s0 / \s-1XHTML\s0 file into a series of MKDoc::XML::Token objects and strictly operates on a list of tokens.

For this same reason MKDoc::XML::Stripper does not support namespaces.

AUTHOR

Copyright 2003 - MKDoc Holdings Ltd.

Author: Jean-Michel Hiver

This module is free software and is distributed under the same license as Perl itself. Use it at your own risk.

RELATED TO MKDoc::XML::Stripper…

MKDoc::XML::Tokenizer MKDoc::XML::Token