NAME WWW::Find - Web Resource Finder SYNOPSIS use LWP::UserAgent; use HTTP::Request; use WWW::Find; $agent = LWP::UserAgent->new; $request = HTTP::Request->new(GET => 'http://begin.url'); $find = WWW::Find->new(AGENT => $agent, REQUEST => $request, MAX_DEPTH => 2, MATCH_SUB => \&match, FOLLOW_SUB => \&follow ); $find->go; DESCRIPTION WWW::Find is a Perl module that simplifies the task of searching the web for specific types of information. The inspiration for this project came from the recursive website mirroring program, w3mir. WWW::Find is similar to w3mir, but with a more general feature set. In a nutshell, a WWW::Find object extracts all the HREF links from an HTML document, creates a HTTP::Request object for each link, matches the HTTP::Response object against user specified criteria, and then does something with the matching links (possibly performing the entire operation all over again on certain links). Be careful not to set the MAX_DEPTH parameter too high, otherwise you could easily begin the end- less task of requesting every page on the net! In addition to a LPW::UserAgent and a HTTP::Request object, you'll need to create two subroutines: a &match subroutine and a &follow subrou- tine. The &follow subroutine should attempt to match the HTTP::Response object against user defined criteria. If a match is found, the entire operation is performed all over again on the matching link. For exam- ple, the following subroutine matches all links where the header con- tent-type matches the regular expression /text/. sub follow { my $find_obj = shift; my $header = HTTP::Request->new(HEAD => $find_obj->{REQUEST}->uri); my $response = $find_obj->{AGENT}->request($header) || next; $response->content_type =~ /text/io ? return 1 : return 0; } The &match subroutine should perform some action on links matching user defined criteria. For example, the following subroutine simply prints out the URL of all links matching the regular expression /html?$/ sub match { my $find_obj = shift; if($find_obj->{REQUEST}->uri =~ /html?$/io) { print $find_obj->{REQUEST}->uri . "\n"; } return; } DEPENDENCIES HTML::LinkExtor LWP::UserAgent HTTP::Request URI SEE ALSO HTTP::Request LPW::UserAgent AUTHOR Nathaniel Graham, <broom@cpan.org<gt> http://www.gnusto.net is the offical home page of WWW::Find COPYRIGHT AND LICENSE Copyright 2003 by Nathaniel Graham This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.