Publishing Wikipedia usage data with strong privacy guarantees
Temilola Adeleye, Skye Berghel, Damien Desfontaines, Michael Hay, Isaac Johnson, Cl\'eo Lemoisson, Ashwin Machanavajjhala, Tom Magerlein, Gabriele Modena, David Pujol, Daniel Simmons-Marengo, and Hal Triedman

TL;DR
The paper details how the Wikimedia Foundation began publishing Wikipedia page view data with country-level granularity using differential privacy, balancing data utility with strong privacy guarantees for users.
Contribution
It introduces a novel differential privacy approach for publishing granular Wikipedia usage data, addressing privacy concerns while maintaining data usefulness.
Findings
Successful deployment of privacy-preserving data publication
Enhanced granularity of Wikipedia usage statistics
Strong privacy guarantees for user data
Abstract
For almost 20 years, the Wikimedia Foundation has been publishing statistics about how many people visited each Wikipedia page on each day. This data helps Wikipedia editors determine where to focus their efforts to improve the online encyclopedia, and enables academic research. In June 2023, the Wikimedia Foundation, helped by Tumult Labs, addressed a long-standing request from Wikipedia editors and academic researchers: it started publishing these statistics with finer granularity, including the country of origin in the daily counts of page views. This new data publication uses differential privacy to provide robust guarantees to people browsing or editing Wikipedia. This paper describes this data publication: its goals, the process followed from its inception to its deployment, the algorithms used to produce the data, and the outcomes of the data release.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Wikis in Education and Collaboration · Hate Speech and Cyberbullying Detection
