Latest version is 0.1a
The intent behind A-Spider is to give the power of a multi-threaded java web spider using apache commons httpclient v3.0 to an end-user.
There is a donation page set up here, should you feel compelled to give a little back for such a fine program.
- Add a GUI Interface - ASpider only takes input from command line arguments. It would be nice (and rather easy) to slap a nice java.awt gui on it.
- Binary file parsing - Adding functionality to support pdf/office file/shockwave/audio file tags/any other binary file on the net parsing would be a big help. There are already open source tools to parse these binary files, we just need to add the support to ASpider.
- HTML Playground - Add html pages to ASpider's existing html playground test site with all possible combontaions of frames, links, tags, malformed html and html annomolies. This should include style sheets, java script, meta-tags and in all their various forms. The idea is to create all possible html scenarios in one test site and use this site to test ASpider's functionality.
- Unit Testing - Adding source code to test the functionalty of ASpider's classes and class methods. For example, testing the individual parsing methods and http request methods against the HTML Playgroud.
- Javadoc support - The code is pretty hefty with comments in javadoc style already. It would be nice if more explaination could be added to each method's javadoc comment block, to create a nice looking javadoc.
- Add a sample module to demonstrate ASpider's ability to only crawl a given website path for data mining purposes.
- Completed - Add Form Submission support for crawling the results of web forms.
- Completed - Add support to automatically follow meta-refresh tags
- Completed - Add Proxy Support to allow ASpider to use a list of proxies.
- Completed - Add Log4j configuration file support.
- Completed - Add Command Line interface support.
Last Modfied: Friday, 10-Jun-2005 15:03:04 UTC