Automated Discovery of Internet Censorship by Web Crawling

Alexander Darer; Oliver Farnan; Joss Wright

arXiv:1804.03056·cs.CY·April 20, 2018

Automated Discovery of Internet Censorship by Web Crawling

Alexander Darer, Oliver Farnan, Joss Wright

PDF

TL;DR

This paper introduces an automated web crawling system to discover and map censored domains across different countries, significantly expanding existing filter lists and revealing the interconnected nature of censored content.

Contribution

The authors develop a fully automated, no-human-interaction method for identifying filtered domains using web crawling, outperforming existing techniques and creating larger, more comprehensive censorship datasets.

Findings

01

Successfully identified more filtered domains than existing methods.

02

Built domain filter lists an order of magnitude larger than public lists as of Jan 2018.

03

Mapped the interconnected network of censored web resources.

Abstract

Censorship of the Internet is widespread around the world. As access to the web becomes increasingly ubiquitous, filtering of this resource becomes more pervasive. Transparency about specific content that citizens are denied access to is atypical. To counter this, numerous techniques for maintaining URL filter lists have been proposed by various individuals and organisations that aim to empirical data on censorship for benefit of the public and wider censorship research community. We present a new approach for discovering filtered domains in different countries. This method is fully automated and requires no human interaction. The system uses web crawling techniques to traverse between filtered sites and implements a robust method for determining if a domain is filtered. We demonstrate the effectiveness of the approach by running experiments to search for filtered content in four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.