Botcha: Detecting Malicious Non-Human Traffic in the Wild
Sunny Dhamnani, Ritwik Sinha, Vishwa Vinay, Lilly Kumari, Margarita, Savova

TL;DR
This paper introduces Botcha, a robust PU learning-based system for detecting malicious bots in web traffic, outperforming standard methods by addressing assumptions about positive label sampling.
Contribution
Proposes two modifications to PU learning to improve robustness in detecting malicious bots in web traffic, even when positive labels are not randomly sampled.
Findings
Outperforms standard PU learning methods in identifying humans.
Effective on both public and proprietary datasets.
Enhances bot detection accuracy in adversarial web environments.
Abstract
Malicious bots make up about a quarter of all traffic on the web, and degrade the performance of personalization and recommendation algorithms that operate on e-commerce sites. Positive-Unlabeled learning (PU learning) provides the ability to train a binary classifier using only positive (P) and unlabeled (U) instances. The unlabeled data comprises of both positive and negative classes. It is possible to find labels for strict subsets of non-malicious actors, e.g., the assumption that only humans purchase during web sessions, or clear CAPTCHAs. However, finding signals of malicious behavior is almost impossible due to the ever-evolving and adversarial nature of bots. Such a set-up naturally lends itself to PU learning. Unfortunately, standard PU learning approaches assume that the labeled set of positives are a random sample of all positives, this is unlikely to hold in practice. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Advanced Malware Detection Techniques · Network Security and Intrusion Detection
