Shy Guys: A Light-Weight Approach to Detecting Robots on Websites
R\'emi Van Boxem, Tom Barbette, Cristel Pelsser, Ramin Sadre

TL;DR
This paper introduces a lightweight, passive method for detecting web bots using user-agent and favicon heuristics, operating solely on server logs, and effectively identifying a significant portion of bot traffic with low false positives.
Contribution
The authors present a novel, resource-efficient bot detection technique that outperforms existing methods by combining user-agent analysis with favicon heuristics without client-side interaction.
Findings
Detects 67.7% of bot traffic with 3% false positives
Operates solely on server logs without client interaction
Outperforms state-of-the-art detection methods
Abstract
Automated bots now account for roughly half of all web requests, and an increasing number deliberately spoof their identity to either evade detection or to not respect robots.txt. Existing countermeasures are either resource-intensive (JavaScript challenges, CAPTCHAs), cost-prohibitive (commercial solutions), or degrade the user experience. This paper proposes a lightweight, passive approach to bot detection that combines user-agent string analysis with favicon-based heuristics, operating entirely on standard web server logs with no client-side interaction. We evaluate the method on over 4.6 million requests containing 54,945 unique user-agent strings collected from website hosted all around the earth. Our approach detects 67.7% of bot traffic while maintaining a false-positive rate of 3%, outperforming state of the art (less than 20%). This method can serve as a first line of defence,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
