Automatic Generation of Web Censorship Probe Lists

Jenny Tang; Leo Alvarez; Arjun Brar; Nguyen Phong Hoang; Nicolas; Christin

arXiv:2407.08185·cs.CR·July 12, 2024

Automatic Generation of Web Censorship Probe Lists

Jenny Tang, Leo Alvarez, Arjun Brar, Nguyen Phong Hoang, Nicolas, Christin

PDF

TL;DR

This paper presents an automated method for generating and updating web censorship probe lists by analyzing URL content, expanding topics, and testing accessibility from multiple locations, improving the scalability and accuracy of censorship measurement.

Contribution

It introduces a novel automated approach to generate and update web censorship probe lists using content analysis and search engine expansion, reducing manual effort and increasing coverage.

Findings

01

Discovered over 1,400 new potentially censored domains

02

Generated 119,255 new URLs from initial seed URLs

03

Demonstrated the feasibility of automated, scalable censorship measurement

Abstract

Domain probe lists--used to determine which URLs to probe for Web censorship--play a critical role in Internet censorship measurement studies. Indeed, the size and accuracy of the domain probe list limits the set of censored pages that can be detected; inaccurate lists can lead to an incomplete view of the censorship landscape or biased results. Previous efforts to generate domain probe lists have been mostly manual or crowdsourced. This approach is time-consuming, prone to errors, and does not scale well to the ever-changing censorship landscape. In this paper, we explore methods for automatically generating probe lists that are both comprehensive and up-to-date for Web censorship measurement. We start from an initial set of 139,957 unique URLs from various existing test lists consisting of pages from a variety of languages to generate new candidate pages. By analyzing content from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training