Many factors affect the crawling of a site, including (but not limited to):
- The total number of pages on a site (is the site small, large, or somewhere in-between?)
- The size of the content (PDFs and Microsoft Office files are typically much larger than regular HTML files)
- The freshness of the content (how often is content added/removed/changed?)
- The number of allowed concurrent connections (a function of the web server infrastructure)
- The bandwidth of the site (a function of the hostâ€™s service provider; the lower the bandwidth, the lower the serverâ€™s capacity to serve page requests)
- How highly does the site rank (content judged as not relevant wonâ€™t be crawled as often as highly relevant content)
The rate at which a site is crawled is an amalgam of all of those factors and more. If a site is highly ranked and has a ton of pages, more of those pages will be indexed, which means it needs to be crawled more thoroughly (and that takes time). If the siteâ€™s content is regularly updated, itâ€™ll be crawled more often to keep the index fresh, which better serves search customers (as well as the goals of the siteâ€™s webmasters).
As so many factors are involved in the crawl rate, there is no clear, generic answer as to whether you should set a crawl delay. And how long it takes to finish a crawl of a site is also based on the above factors. The bottom line is this: if webmasters want their content to be included in the index, it has to be crawled. There are only 86,400 seconds in a day (leap seconds excluded!), so any delay imposed upon the bot will only reduce the amount and the freshness of the content placed into the index on a daily basis.
That said, some webmasters, for technical reasons on their side, need a crawl delay option. As such, we want to explain how to do it, what your choices are for the settings, and the implications for doing so.
Also Read How to set Crawl Delay