Latest version is 0.1a
To Do List
- Add a GUI Interface - ASpider only takes input from command line arguments. It would be nice (and rather easy) to slap a nice java.awt gui on it.
- Binary file parsing - Adding functionality to support pdf/office file/shockwave/audio file tags/any other binary file on the net parsing would be a big help. There are already open source tools to parse these binary files, we just need to add the support to ASpider.
- Javascript interpreter - Mozilla has an open-source java script interpreter that (as far as i know) no spider has been able to integrate and use. So by adding even crude functionality of this would be a huge step forward for ASpider and web spidering in general.
- HTML Playground - Add html pages to ASpider's existing html playground test site with all possible combontaions of frames, links, tags, malformed html and html annomolies. This should include style sheets, java script, meta-tags and in all their various forms. The idea is to create all possible html scenarios in one test site and use this site to test ASpider's functionality.
- Unit Testing - Adding source code to test the functionalty of ASpider's classes and class methods. For example, testing the individual parsing methods and http request methods against the HTML Playgroud.
- Javadoc support - The code is pretty hefty with comments in javadoc style already. It would be nice if more explaination could be added to each method's javadoc comment block, to create a nice looking javadoc.
- Add a sample module to demonstrate ASpider's ability to only crawl a given website path for data mining purposes.
- Completed - Add Form Submission support for crawling the results of web forms.
- Completed - Add support to automatically follow meta-refresh tags
- Completed - Add Proxy Support to allow ASpider to use a list of proxies.
- Completed - Add Log4j configuration file support.
- Completed - Add Command Line interface support.
Last Modfied: [an error occurred while processing the directive]