Top 20 Obvious Bots
These bots are nice enough to include "bot", "spider", or "crawl" in their user agent string, or access the robots.txt file. Here are the top 20, representing 89% of obvious bot hits:
- 18% - Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)
- 13% - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
- 12% - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
- 7% - Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_1 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8B117 Safari/6531.22.7 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
- 5% - Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)
- 5% - Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)
- 5% - Mozilla/5.0 (compatible; WBSearchBot/1.1; +http://www.warebay.com/bot.html)
- 4% - Twitterbot/1.0
- 3% - Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
- 3% - Mozilla/5.0 (compatible; AhrefsBot/4.0; +http://ahrefs.com/robot/)
- 2% - ShowyouBot (http://showyou.com/crawler)
- 2% - Mozilla/5.0 (compatible; TweetmemeBot/3.0; +http://tweetmeme.com/)
- 2% - Aboundex/0.3 (http://www.aboundex.com/crawler/)
- 2% - Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
- 2% - Mozilla/5.0 (compatible; SISTRIX Crawler; http://crawler.sistrix.net/)
- 2% - msnbot/2.0b (+http://search.msn.com/msnbot.htm)
- 1% - Mozilla/5.0 (compatible; PaperLiBot/2.1; http://support.paper.li/entries/20023257-what-is-paper-li)
- 1% - Mozilla/5.0 (compatible; SearchmetricsBot; http://www.searchmetrics.com/en/searchmetrics-bot/)
- 1% - Mozilla/5.0 (compatible; Dow Jones Searchbot)
- 1% - Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
Top 20 Developer Packages or Proprietary Bots
These bots are built on developer packages, but don't specifically identify themselves as a bot. The top 20 represent 92% of hits from these bots.
- 19% - checks.panopta.com
- 16% - NING/1.0
- 13% - UnwindFetchor/1.0 (+http://www.gnip.com/)
- 10% - FeedBurner/1.0 (http://www.FeedBurner.com)
- 7% - JS-Kit URL Resolver, http://js-kit.com/
- 5% - UniversalFeedParser/5.0.1 +http://feedparser.org/
- 4% - PycURL/7.19.5
- 3% - Java/1.6.0_26
- 3% - TwitterFeed 3
- 2% - HTMLParser/2.0
- 2% - Ruby
- 2% - Mozilla/5.0 (Digg/1.0; support@digg.com)
- 1% - Java/1.7.0_21
- 1% - Crowsnest/0.5 (+http://www.crowsnest.tv/)
- 1% - curl/7.24.0
- 1% - Opera/7.11 (Windows NT 5.1; U) [en]
- 1% - MetaURI API/2.0 +metauri.com
- 1% - Jakarta Commons-HttpClient/3.1
- 1% - InAGist URL Resolver (http://inagist.com)
- 1% - Mozilla/5.0
Plus these other notables:
- Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; subscribers; feed-id=)
- Mozilla/5.0 (compatible; Embedly/0.2; +http://support.embed.ly/)
Top 20 Sneaky Bots
These bots either don't identify themselves, mask their identity using a common real user agent, or don't include a user agent. I identify these by hits from same or similar IP addresses, complete lack of any referring URLs, or too many hits from the same IP address.
- From IP 168.62.192.113 (Microsoft) with user agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.163 Safari/535.19".
- More coming soon