Actions speak louder than words: Semi-supervised learning for browser fingerprinting detection
Sarah Bird, Vikas Mishra, Steven Englehardt, Rob Willoughby, David, Zeber, Walter Rudametkin, Martin Lopatka

TL;DR
This paper introduces a semi-supervised machine learning method for detecting browser fingerprinting scripts, effectively identifying known and new scripts, including previously undetected device-class fingerprinting in the wild.
Contribution
It presents a novel semi-supervised approach that groups scripts by API access patterns, expanding detection beyond heuristics and uncovering new fingerprinting scripts.
Findings
Detected over 94.9% of scripts identified by heuristics
Uncovered fingerprinting scripts missed by heuristics
Identified over 100 device-class fingerprinting scripts in the wild
Abstract
As online tracking continues to grow, existing anti-tracking and fingerprinting detection techniques that require significant manual input must be augmented. Heuristic approaches to fingerprinting detection are precise but must be carefully curated. Supervised machine learning techniques proposed for detecting tracking require manually generated label-sets. Seeking to overcome these challenges, we present a semi-supervised machine learning approach for detecting fingerprinting scripts. Our approach is based on the core insight that fingerprinting scripts have similar patterns of API access when generating their fingerprints, even though their access patterns may not match exactly. Using this insight, we group scripts by their JavaScript (JS) execution traces and apply a semi-supervised approach to detect new fingerprinting scripts. We detail our methodology and demonstrate its ability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Privacy, Security, and Data Protection · Spam and Phishing Detection
