When Handshakes Tell the Truth: Detecting Web Bad Bots via TLS Fingerprints
Ghalia Jarad, Kemal Bicakci

TL;DR
This paper presents a TLS fingerprinting approach using JA4 and machine learning classifiers to accurately detect malicious web bots, even those mimicking human behavior, with high precision and robustness.
Contribution
It introduces a protocol-level bot detection method leveraging TLS fingerprints and machine learning, achieving near-perfect accuracy and identifying key fingerprint features.
Findings
CatBoost achieved an AUC of 0.998 and F1 score of 0.9734.
TLS fingerprint features like ja4_b, cipher_count, and ext_count are highly influential.
The method is accurate 98.63% on the test set.
Abstract
Automated traffic continued to surpass human-generated traffic on the web, and a rising proportion of this automation was explicitly malicious. Evasive bots could pretend to be real users, even solve Captchas and mimic human interaction patterns. This work explores a less intrusive, protocol-level method: using TLS fingerprinting with the JA4 technique to tell apart bots from real users. Two gradient-boosted machine learning classifiers (XGBoost and CatBoost) were trained and evaluated on a dataset of real TLS fingerprints (JA4DB) after feature extraction, which derived informative signals from JA4 fingerprints that describe TLS handshake parameters. The CatBoost model performed better, achieving an AUC of 0.998 and an F1 score of 0.9734. It was accurate 0.9863 of the time on the test set. The XGBoost model showed almost similar results. Feature significance analyses identified JA4…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Spam and Phishing Detection · User Authentication and Security Systems
