What is a Web Spider?


Stealth Spidering without using a web browser involves many techniques.  

> ip blocking.  A good solution to this is to use proxies.  Open proxies exist all over the internet.  Using open proxies, a spider can look like 1000 individual users surfing a web site.  Of course if you use the same proxy over and over, that could get flagged.  Take as much care as you possibly can to ensure the order in which the links are spidered, the amount of links each proxy uses, etc. are randomized.    
> The idea is to have each proxy look like a separate individual user. 
Using unknown proxies can also be unreliable.  The library that uses the proxies should also be responsible for updating the status of the proxies being used to avoid using bad proxies.   
> Modifying the user-agent is critical to appear like a web surfer.

Conversely, its possible to use a web browser to surf a website.  The ideal web spider would be to automate an actual web browser in concert with a packet sniffer like SNORT to collect the data.  Using windows COM libraries, you can easily automate a windows program.  

< prev | next >