SYNOPSIS

    $ urifind file

DESCRIPTION

urifind is a simple script that finds URIs in one or more files (using \*(C`URI::Find\*(C'), and outputs them to to \s-1STDOUT\s0. That's it.

To find all the URIs in file1, use:

$ urifind file1

To find the URIs in multiple files, simply list them as arguments:

$ urifind file1 file2 file3

urifind will read from \*(C`STDIN\*(C' if no files are given or if a filename of \*(C`-\*(C' is specified:

$ wget http://www.boston.com/ -O - | urifind

When multiple files are listed, urifind prefixes each found \s-1URI\s0 with the file from which it came:

$ urifind file1 file2 file1: http://www.boston.com/index.html file2: http://use.perl.org/

This can be turned on for single files with the \*(C`-p\*(C' (\*(L"prefix\*(R") switch:

$urifind -p file3 file1: http://fsck.com/rt/

It can also be turned off for multiple files with the \*(C`-n\*(C' (\*(L"no prefix\*(R") switch:

$ urifind -n file1 file2 http://www.boston.com/index.html http://use.perl.org/

By default, URIs will be displayed in the order found; to sort them ascii-betically, use the \*(C`-s\*(C' (\*(L"sort\*(R") option. To reverse sort them, use the \*(C`-r\*(C' (\*(L"reverse\*(R") flag (\*(C`-r\*(C' implies \*(C`-s\*(C').

$ urifind -s file1 file2 http://use.perl.org/ http://www.boston.com/index.html mailto:[email protected]

$ urifind -r file1 file2 mailto:[email protected] http://www.boston.com/index.html http://use.perl.org/

Finally, urifind supports limiting the returned URIs by scheme or by arbitrary pattern, using the \*(C`-S\*(C' option (for schemes) and the \*(C`-P\*(C' option. Both \*(C`-S\*(C' and \*(C`-P\*(C' can be specified multiple times:

$ urifind -S mailto file1 mailto:[email protected]

$ urifind -S mailto -S http file1 mailto:[email protected] http://www.boston.com/index.html

\*(C`-P\*(C' takes an arbitrary Perl regex. It might need to be protected from the shell:

$ urifind -P 's?html?' file1 http://www.boston.com/index.html

$ urifind -P '\.org\b' -S http file4 http://www.gnu.org/software/wget/wget.html

Add a \*(C`-d\*(C' to have urifind dump the refexen generated from \*(C`-S\*(C' and \*(C`-P\*(C' to \*(C`STDERR\*(C'. \*(C`-D\*(C' does the same but exits immediately:

$ urifind -P '\.org\b' -S http -D $scheme = '^(\bhttp\b):' @pats = ('^(\bhttp\b):', '\.org\b')

To remove duplicates from the results, use the \*(C`-u\*(C' (\*(L"unique\*(R") switch.

OPTION SUMMARY

-s

Sort results.

-r

Reverse sort results (implies -s).

-u

Return unique results only.

-n

Don't include filename in output.

-p

Include filename in output (0 by default, but 1 if multiple files are included on the command line). Print only lines matching regex '$re' (may be specified multiple times). Only this scheme (may be specified multiple times).

-h

Help summary.

-v

Display version and exit.

-d

Dump compiled regexes for \*(C`-S\*(C' and \*(C`-P\*(C' to \*(C`STDERR\*(C'.

-D

Same as \*(C`-d\*(C', but exit after dumping.

AUTHOR

darren chamberlain <[email protected]>

COPYRIGHT

(C) 2003 darren chamberlain

This library is free software; you may distribute it and/or modify it under the same terms as Perl itself.

RELATED TO urifind…

URI::Find