What is a WWW robot?
A robot is a program that automatically traverses the Web’s hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
Note that “recursive” here doesn’t limit the definition to any specific traversal algorithm; even if a robot applies some heuristic to the selection and order of documents to visit and spaces out requests over a long space of time, it is still a robot.
Normal Web browsers are not robots, because they are operated by a human, and don’t automatically retrieve referenced documents (other than inline images).
Web robots are sometimes referred to as Web Wanderers, Web Crawlers, or Spiders. These names are a bit misleading as they give the impression the software itself moves between sites like a virus; this not the case, a robot simply visits sites by requesting documents from them.
What is an agent?
The word “agent” is used for lots of meanings in computing these days. Specifically:
- Autonomous agents
- are programs that do travel between sites, deciding themselves when to move and what to do. These can only travel between special servers and are currently not widespread in the Internet.
- Intelligent agents
- are programs that help users with things, such as choosing a product, or guiding a user through form filling, or even helping users find things. These have generally little to do with networking.
- User-agent
- is a technical name for programs that perform networking tasks for a user, such as Web User-agents like Netscape Navigator and Microsoft Internet Explorer, and Email User-agent like Qualcomm Eudora etc.
What is a search engine?
A search engine is a program that searches through some dataset. In the context of the Web, the word “search engine” is most often used for search forms that search through databases of HTML documents gathered by a robot.
What kinds of robots are there?
Robots can be used for a number of purposes:
- Indexing
- HTML validation
- Link validation
- “What’s New” monitoring
- Mirroring
Tags:choose, computer, crawlers, databases, email, Generator, help, images, microsoft, network, networking, recursion, recursively, server, servers, sites, spiders, travelers, what
you can also grab the RSS feed or Subscribe to Techgurulive by Email
































