Blocked or Broken? Automatically Detecting When Privacy Interventions Break Websites
Michael Smith, Peter Snyder, Moritz Haller, Benjamin Livshits, Deian, Stefan, Hamed Haddadi

TL;DR
This paper introduces an automated classifier system that predicts when privacy filter rules will break websites, aiming to improve filter list accuracy and reduce site breakage by enabling pre-deployment testing.
Contribution
It presents the first automated, machine learning-based system for predicting website breakage caused by privacy filter rules, trained on compatibility data and browser instrumentation.
Findings
Classifier achieves an AUC of 0.88 in predicting breakage.
System requires no human intervention for assessing filter rule risks.
Identifies key website behaviors that predict breakage.
Abstract
A core problem in the development and maintenance of crowd-sourced filter lists is that their maintainers cannot confidently predict whether (and where) a new filter list rule will break websites. This is a result of enormity of the Web, which prevents filter list authors from broadly understanding the impact of a new blocking rule before they ship it to millions of users. The inability of filter list authors to evaluate the Web compatibility impact of a new rule before shipping it severely reduces the benefits of filter-list-based content blocking: filter lists are both overly-conservative (i.e. rules are tailored narrowly to reduce the risk of breaking things) and error-prone (i.e. blocking tools still break large numbers of sites). To scale to the size and scope of the Web, filter list authors need an automated system to detect when a new filter rule breaks websites, before that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Spam and Phishing Detection · Network Security and Intrusion Detection
