Analysing and strengthening OpenWPM's reliability
Benjamin Krumnow, Hugo Jonker, Stefan Karsch

TL;DR
This paper examines the detectability and data integrity of OpenWPM, a popular web automation framework, revealing its vulnerabilities to detection and evasion, and proposing mitigations to enhance its reliability.
Contribution
It provides a detailed analysis of OpenWPM's detectability, uncovers new evasion techniques, and develops mitigations to improve automation reliability.
Findings
OpenWPM is easily detectable on many websites.
Scripts contain routines to detect OpenWPM clients.
Novel evasion techniques against OpenWPM were identified.
Abstract
Automated browsers are widely used to study the web at scale. Their premise is that they measure what regular browsers would encounter on the web. In practice, deviations due to detection of automation have been found. To what extent automated browsers can be improved to reduce such deviations has so far not been investigated in detail. In this paper, we investigate this for a specific web automation framework: OpenWPM, a popular research framework specifically designed to study web privacy. We analyse (1) detectability of OpenWPM, (2) prevalence of OpenWPM detection, and (3) integrity of OpenWPM's data recording. Our analysis reveals OpenWPM is easily detectable. We measure to what extent fingerprint-based detection is already leveraged against OpenWPM clients on 100,000 sites and observe that it is commonly detected (~14% of front pages). Moreover, we discover integrated routines in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Advanced Malware Detection Techniques · Privacy, Security, and Data Protection
