"Having Confidence in My Confidence Intervals": How Data Users Engage with Privacy-Protected Wikipedia Data
Harold Triedman, Jayshree Sarathy, Priyanka Nanayakkara, Rachel Cummings, Gabriel Kaptchuk, Sean Kross, Elissa M. Redmiles

TL;DR
This study investigates how data users interpret and work with privacy-preserving noise in Wikipedia datasets, revealing challenges in understanding confidence intervals and misconceptions about privacy strength, and offers design improvements.
Contribution
It provides empirical insights into user interactions with privacy-noised data and proposes documentation and tool enhancements for better usability and understanding.
Findings
Users easily used simple uncertainty metrics
Struggled with computing confidence intervals across noisy data
Some users wrongly believed stronger privacy implied weaker utility
Abstract
In response to calls for open data and growing privacy threats, organizations are increasingly adopting privacy-preserving techniques such as differential privacy (DP) that inject statistical noise when generating published datasets. These techniques are designed to protect privacy of data subjects while enabling useful analyses, but their reception by data users is under-explored. We developed documentation that presents the noise characteristics of two Wikipedia pageview datasets: one using rounding (heuristic privacy) and another using DP (formal privacy). After incorporating expert feedback (n=5), we used these documents to conduct a task-based contextual inquiry (n=15) exploring how data users--largely unfamiliar with these methods--perceive, interact with, and interpret privacy-preserving noise during data analysis. Participants readily used simple uncertainty metrics from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Privacy-Preserving Technologies in Data · Ethics and Social Impacts of AI
