Botcha: Detecting Malicious Non-Human Traffic in the Wild

Sunny Dhamnani; Ritwik Sinha; Vishwa Vinay; Lilly Kumari; Margarita; Savova

arXiv:2103.01428·cs.LG·March 3, 2021·1 cites

Botcha: Detecting Malicious Non-Human Traffic in the Wild

Sunny Dhamnani, Ritwik Sinha, Vishwa Vinay, Lilly Kumari, Margarita, Savova

PDF

Open Access

TL;DR

This paper introduces Botcha, a robust PU learning-based system for detecting malicious bots in web traffic, outperforming standard methods by addressing assumptions about positive label sampling.

Contribution

Proposes two modifications to PU learning to improve robustness in detecting malicious bots in web traffic, even when positive labels are not randomly sampled.

Findings

01

Outperforms standard PU learning methods in identifying humans.

02

Effective on both public and proprietary datasets.

03

Enhances bot detection accuracy in adversarial web environments.

Abstract

Malicious bots make up about a quarter of all traffic on the web, and degrade the performance of personalization and recommendation algorithms that operate on e-commerce sites. Positive-Unlabeled learning (PU learning) provides the ability to train a binary classifier using only positive (P) and unlabeled (U) instances. The unlabeled data comprises of both positive and negative classes. It is possible to find labels for strict subsets of non-malicious actors, e.g., the assumption that only humans purchase during web sessions, or clear CAPTCHAs. However, finding signals of malicious behavior is almost impossible due to the ever-evolving and adversarial nature of bots. Such a set-up naturally lends itself to PU learning. Unfortunately, standard PU learning approaches assume that the labeled set of positives are a random sample of all positives, this is unlikely to hold in practice. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Advanced Malware Detection Techniques · Network Security and Intrusion Detection