If you access and study your web server logs, you will find that more than 50% of the traffic is taken by Bots , Both search engine and Spam Bots. Â Recently we have tried to build a system where we can analyze the data of the website, its seems there is big difference between GoogleÂ AnalyticÂ and Our own Reporting, We have removed all the bot traffic from the data and still its very muchÂ higherÂ than Google Analytic.
It is very easy to find out search engineÂ crawlingÂ bots , like Google, yahoo and Bing, By analyzing the userÂ agentÂ filed , it is very easy to determine them. but many spam bots never gives there identity, if we check the user agent it will be same like of common browsers.
we are still struggling to block all unwanted bots, some of our ideas are
- Putting a java script to determine the source of the traffic
- Blocking the ips which hits the server very frequently