SYNOPSIS

    use Regexp::Common qw /URI/;

    while (<>) {
        /$RE{URI}{FTP}/       and  print "Contains an FTP URI.\n";
    }

DESCRIPTION

Returns a regex for \s-1FTP\s0 URIs. Note: \s-1FTP\s0 URIs are not formally defined. \s-1RFC\s0 1738 defines \s-1FTP\s0 URLs, but parts of that \s-1RFC\s0 have been obsoleted by \s-1RFC\s0 2396. However, the differences between \s-1RFC\s0 1738 and \s-1RFC\s0 2396 are such that they aren't applicable straightforwardly to \s-1FTP\s0 URIs.

There are two main problems:

Passwords.

\s-1RFC\s0 1738 allowed an optional username and an optional password (separated by a colon) in the \s-1FTP\s0 \s-1URL\s0. Hence, colons were not allowed in either the username or the password. \s-1RFC\s0 2396 strongly recommends passwords should not be used in URIs. It does allow for userinfo instead. This userinfo part may contain colons, and hence contain more than one colon. The regexp returned follows the \s-1RFC\s0 2396 specification, unless the {-password} option is given; then the regex allows for an optional username and password, separated by a colon.

The ;type specifier.

\s-1RFC\s0 1738 does not allow semi-colons in \s-1FTP\s0 path names, because a semi-colon is a reserved character for \s-1FTP\s0 URIs. The semi-colon is used to separate the path from the option type specifier. However, in \s-1RFC\s0 2396, paths consist of slash separated segments, and each segment is a semi-colon separated group of parameters. Straigthforward application of \s-1RFC\s0 2396 would mean that a trailing type specifier couldn't be distinguished from the last segment of the path having a two parameters, the last one starting with type=. Therefore we have opted to disallow a semi-colon in the path part of an \s-1FTP\s0 \s-1URI\s0. Furthermore, \s-1RFC\s0 1738 allows three values for the type specifier, A, I and D (either upper case or lower case). However, the internet draft about \s-1FTP\s0 URIs [\s-1DRAFT-FTP-URL\s0] (which expired in May 1997) notes the lack of consistent implementation of the D parameter and drops D from the set of possible values. We follow this practise; however, \s-1RFC\s0 1738 behaviour can be archieved by using the -type = \*(L"[ADIadi]\*(R"> parameter.

\s-1FTP\s0 URIs have the following syntax:

"ftp:" "//" [ userinfo "@" ] host [ ":" port ] [ "/" path [ ";type=" value ]]

When using {-password}, we have the syntax:

"ftp:" "//" [ user [ ":" password ] "@" ] host [ ":" port ] [ "/" path [ ";type=" value ]]

Under \*(C`{-keep}\*(C', the following are returned: The complete \s-1URI\s0. The scheme. The userinfo, or if {-password} is used, the username. If {-password} is used, the password, else \*(C`undef\*(C'. The hostname or \s-1IP\s0 address. The port number. The full path and type specification, including the leading slash. The full path and type specification, without the leading slash. The full path, without the type specification nor the leading slash. The value of the type specification.

REFERENCES

[\s-1DRAFT-URL-FTP\s0]

Casey, James: A \s-1FTP\s0 \s-1URL\s0 Format. November 1996.

[\s-1RFC\s0 1738]

Berners-Lee, Tim, Masinter, L., McCahill, M.: Uniform Resource Locators (\s-1URL\s0). December 1994.

[\s-1RFC\s0 2396]

Berners-Lee, Tim, Fielding, R., and Masinter, L.: Uniform Resource Identifiers (\s-1URI\s0): Generic Syntax. August 1998.

RELATED TO Regexp::Common::URI::ftp…

Regexp::Common::URI for other supported URIs.

AUTHOR

Damian Conway ([email protected])

MAINTAINANCE

This package is maintained by Abigail ([email protected]).

BUGS AND IRRITATIONS

Bound to be plenty.

LICENSE and COPYRIGHT

This software is Copyright (c) 2001 - 2009, Damian Conway and Abigail.

This module is free software, and maybe used under any of the following licenses:

1) The Perl Artistic License. See the file COPYRIGHT.AL. 2) The Perl Artistic License 2.0. See the file COPYRIGHT.AL2. 3) The BSD Licence. See the file COPYRIGHT.BSD. 4) The MIT Licence. See the file COPYRIGHT.MIT.