Googlebot encountered an extremely high number of URLs from your site. This could cause Googlebot to unnecessarily crawl a large number of distinct URLs that point to identical or similar content, or to crawl undesired parts of your site. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.

Common causes of this problem

  • Problematic parameters in the URL.. Session IDs, or sorting methods, for example, can create massive amounts of duplication and a greater number of URLs. Similarly, a dynamically generated calendar might generate links to future and previous dates with no restrictions on start or end dates.
  • Additive filtering of a set of items. Many sites provide different views of the same set of items or search results. Combining filters (for example, show me hotels that are on the beach, are dog-friendly AND have a fitness center), can result in a huge number of mostly redundant URLs.
  • Dynamic generation of documents as a result of counters, timestamps, or advertisements.
  • Broken relative links. Broken relative links can often cause infinite spaces. Frequently, this problem arises because of repeated path elements. For example:
    http://www.example.com/index.shtml/discuss/category/school/061121/
     html/interview/category/health/070223/html/category/business/070302
     /html/category/community/070413/html/FAQ.htm

Steps to resolve this problem

To avoid potential problems with URL structure, we recommend the following:

  • Whenever possible, shorten URLs by trimming unnecessary parameters. Use the Parameter Handling tool to indicate which URL parameters Google can safely ignore. Make sure to use these cleaner URLs for all internal links. Consider redirecting unnecessarily long URLs to their cleaner versions or using the rel=”canonical” link element to specify the preferred, shorter canonical URL.
  • Wherever possible, avoid the use of session IDs in URLs. Consider using cookies instead. Check our Webmaster Guidelines for additional information.
  • If your site has an infinite calendar, add a nofollow attribute to links to dynamically created future calendar pages.
  • Check your site for broken relative links.
  • If none of the above is possible, consider using a robots.txt file to block Googlebot’s access to problematic URLs. Typically, you should consider blocking dynamic URLs, such as URLs that generate search results, or URLs that can create infinite spaces, such as calendars. Using wildcards in your robots.txt file can allow you to easily block large numbers of URLs.

Post By Gishore J Kallarackal (2,121 Posts)

Gishore J Kallarackal is the founder of techgurulive. The purpose of this site is to share information about free resources that techies can use for reference. You can follow me on the social web, subscribe to the RSS Feed or sign up for the email newsletter for your daily dose of tech tips & tutorials. You can content me via @twitter or e-mail.

Website: → Techgurulive

Connect