SYNOPSIS

    use HTML::FormatText::WithLinks;

    my $f = HTML::FormatText::WithLinks->new();

    my $html = qq(
    <html>
    <body>
    <p>
        Some html with a <a href="http://example.com/">link</a>
    </p>
    </body>
    </html>
    );

    my $text = $f->parse($html);

    print $text;

    # results in something like

    Some html with a [1]link

    1. http://example.com/

    my $f2 = HTML::FormatText::WithLinks->new(
        before_link => '',
        after_link => ' [%l]',
        footnote => ''
    );

    $text = $f2->parse($html);
    print $text;

    # results in something like

    Some html with a link [http://example.com/]

    my $f3 = HTML::FormatText::WithLinks->new(
        link_num_generator => sub {
            return "*" x (shift() + 1);
        },
        footnote => '[%n] %l'
    );

    $text = $f3->parse($html);
    print $text;

    # results in something like

    Some html with a [*]link

    [*] http://example.com/

DESCRIPTION

HTML::FormatText::WithLinks takes \s-1HTML\s0 and turns it into plain text but prints all the links in the \s-1HTML\s0 as footnotes. By default, it attempts to mimic the format of the lynx text based web browser's --dump option.

METHODS

new

my $f = HTML::FormatText::WithLinks->new( %options );

Returns a new instance. It accepts all the options of HTML::FormatText plus

base

a base option. This should be set to a \s-1URI\s0 which will be used to turn any relative URIs on the \s-1HTML\s0 to absolute ones.

doc_overrides_base

If a base element is found in the document and it has an href attribute then setting doc_overrides_base to true will cause the document's base to be used. This defaults to false.

before_link (default: '[%n]')
after_link (default: '')

a string to print before a link (i.e. when the <a> is found), after link has ended (i.e. when then </a> is found) and when printing out footnotes. \*(L"%n\*(R" will be replaced by the link number, \*(L"%l\*(R" will be replaced by the link itself. If footnote is set to '', no footnotes will be printed.

link_num_generator (default: sub { return shift() + 1 })

link_num_generator is a sub that returns the value to be printed for a given link number. The internal store starts numbering at 0.

with_emphasis

If set to 1 then italicised text will be surrounded by \*(C`/\*(C' and bolded text by \*(C`_\*(C'. You can change these markers by using the \*(C`italic_marker\*(C' and \*(C`bold_marker\*(C' options.

unique_links

If set to 1 then will only generate 1 footnote per unique \s-1URI\s0 as oppose to the default behaviour which is to generate a footnote per \s-1URI\s0.

anchor_links

If set to 0 then links pointing to local anchors will be skipped. The default behaviour is to include all links.

skip_linked_urls

If set to 1, then links where the text equals the href value will be skipped. The default behaviour is to include all links.

parse

my $text = $f->parse($html);

Takes some \s-1HTML\s0 and returns it as text. Returns undef on error.

Will also return undef if you don't pass it undef. Returns an empty string if passed an empty string.

parse_file

my $text = $f->parse_file($filename);

Takes a filename and returns the contents of the file as plain text. Returns undef on error.

error

$f->error();

Returns the last error that occured. In practice this is likely to be either a warning that parse_file couldn't find the file or that HTML::TreeBuilder failed.

CAVEATS

When passing \s-1HTML\s0 fragments the results may be a little unpredictable. I've tried to work round the most egregious of the issues but any unexpected results are welcome.

Also note that if for some reason there is an a tag in the document that does not have an href attribute then it will be quietly ignored. If this is really a problem for anyone then let me know and I'll see if I can think of a sensible thing to do in this case.

AUTHOR

Struan Donald. <[email protected]>

<http://www.exo.org.uk/code/>

Ian Malpass <[email protected]> was responsible for the custom formatting bits and the nudge to release the code.

Simon Dassow <[email protected]<gt> for the anchor_links option plus a few bugfixes and optimisations

Kevin Ryde for the code for pulling the base out the document.

Thomas Sibley <[email protected]> patches for skipping links that are their urls and to change the delimiters for bold and italic text..

COPYRIGHT

Copyright (C) 2003-2010 Struan Donald and Ian Malpass. All rights reserved.

LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

RELATED TO HTML::FormatText::WithLinks…

perl\|(1), HTML::Formatter.