What is a Web Spider?


If the desired content is simple enough to logically identify ( such as an email address, url links, or possibly mailing addresses ) then a dumb spider can be used. But if the data is more abstract like stock quotes, people's names, or other info, then psuedo logic gets put in place to grab the desired content.  The trick here is to parse the page in a generic way so you can build more generic and predictable psudeo-logic to collect the data.

 

Also, the content typically hides behind an online search form, and the form may need to be submitted with key settings to get the right content.  This very difficult to automate.  Let's just say a form where the default values return the desired content are a screen scraper's best friend.

< prev | next>