UEH

Urobot

Urobot is a Ruby Gem that gives a fairly accurate answer to the question if a user-agent string belongs to a (human) browser or a bot. It does so by examinining auser-agent string to see if it looks like a legitimate browser's. If this is not the case Urobot assumes the user-agent string to belong to a bot. This also means that feedreaders, email clients, multimedia players, downloaders etc. are all identified as bots. Urobots true function can be described as distinguishing between human visitors and automated processes.

Urobot takes a whitelist + blacklist approach to identifying bots. First it uses a whitelist to filter out any unknown ua string, it then uses a blacklist to filter out any known bot that pretends to be a legitimate browser.

The white + blacklist approach makes Urobot fairly low maintenance but also vulnerable to false positives and negatives. It cannot distinguish bots that spoof their user-agent string to a perfectly valid one, neither can it correctly identify a browser that has it user-agent string modified to a nonsensical one.

With that in mind, Urobot has been developed against a list of over 30000 user-agent strings (rare browser, various permutations of known browsers, xss attacks, known bots etc.) based on the content provided by the following sites:

plus several additional strings pulled from my sites apache logs.

So Urobot has been well tested and should be right in the majority of cases. But be aware that you should only use it in situations where false negatives and positives are affordable.

Go to github to see the full documentation and source code.

Search