NAME
URI::Find - Find URIs in arbitrary text
SYNOPSIS
require URI::Find;
my $finder = URI::Find->new(\&callback);
$how_many_found = $finder->find(\$text);
DESCRIPTION
This module does one thing: Finds URIs and URLs in plain text. It finds
them quickly and it finds them all (or what URI::URL considers a URI to
be.) It only finds URIs which include a scheme (http:// or the like),
for something a bit less strict have a look at URI::Find::Schemeless.
For a command-line interface, see Darren Chamberlain's "urifind" script.
It's available from his CPAN directory,
.
EXAMPLES
Store a list of all URIs (normalized) in the document.
my @uris;
my $finder = URI::Find->new(sub {
my($uri) = shift;
push @uris, $uri;
});
$finder->find(\$text);
Print the original URI text found and the normalized representation.
my $finder = URI::Find->new(sub {
my($uri, $orig_uri) = @_;
print "The text '$orig_uri' represents '$uri'\n";
return $orig_uri;
});
$finder->find(\$text);
Check each URI in document to see if it exists.
use LWP::Simple;
my $finder = URI::Find->new(sub {
my($uri, $orig_uri) = @_;
if( head $uri ) {
print "$orig_uri is okay\n";
}
else {
print "$orig_uri cannot be found\n";
}
return $orig_uri;
});
$finder->find(\$text);
Turn plain text into HTML, with each URI found wrapped in an HTML
anchor.
use CGI qw(escapeHTML);
use URI::Find;
my $finder = URI::Find->new(sub {
my($uri, $orig_uri) = @_;
return qq|$orig_uri|;
});
$finder->find(\$text, \&escapeHTML);
print "
$text
";
AUTHOR
Michael G Schwern with insight from Uri Gutman, Greg
Bacon, Jeff Pinyan, Roderick Schertler and others.
Roderick Schertler maintained versions 0.11 to
0.16.
LICENSE
Copyright 2000, 2009 by Michael G Schwern .
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
See http://www.perlfoundation.org/artistic_license_1_0
SEE ALSO
URI::Find::Schemeless, URI::URL, URI, RFC 3986 Appendix C