Filter List Generation for Underserved Regions
Alexander Sjosten, Peter Snyder, Antonio Pastor, Panagiotis, Papadopoulos, Benjamin Livshits

TL;DR
This paper introduces a novel two-step pipeline combining deep browser instrumentation and a multilingual ad classifier to generate effective filter lists for regions with underserved web communities, improving web security and privacy.
Contribution
It presents a new method for creating regional filter lists using advanced request chain analysis and a multilingual ad classifier, addressing gaps in existing crowd-sourced filter lists.
Findings
Generated filter lists blocked 3,349 additional ad resources in target regions.
Applied to Sri Lanka, Hungary, and Albania, improving regional web filtering.
Enhanced filter list coverage for underserved linguistic and geographic web communities.
Abstract
Filter lists play a large and growing role in protecting and assisting web users. The vast majority of popular filter lists are crowd-sourced, where a large number of people manually label resources related to undesirable web resources (e.g., ads, trackers, paywall libraries), so that they can be blocked by browsers and extensions. Because only a small percentage of web users participate in the generation of filter lists, a crowd-sourcing strategy works well for blocking either uncommon resources that appear on "popular" websites, or resources that appear on a large number of "unpopular" websites. A crowd-sourcing strategy will perform poorly for parts of the web with small "crowds", such as regions of the web serving languages with (relatively) few speakers. This work addresses this problem through the combination of two novel techniques: (i) deep browser instrumentation that allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
