What is a Web Spider?

Web servers usually are not expecting spiders to scrape their data. Some web sites want their content scraped and become more widespread ( bank websites' real estate listings, business listings, on-line advertisers ).

Some websites contain content that is free to take - public information ( census, tax roll, court documents, govt. documents ).

There are some websites who display un-copyrightable information yet they do not want people to spider them. This could be your competition or large search engines/websites that block spiders. They only want to spend their bandwidth on live people viewing content, not data collecting spiders. Some do not want their data being saved into a database for other websites to use. Dealing with this situation takes us to a whole new area of spidering: Stealth Spidering.

< prev | next >