Evolution of Privacy Loss in Wikipedia
Marian-Andrei Rizoiu, Lexing Xie, Tiberio Caetano, Manuel Cebrian

TL;DR
This study investigates how individual privacy diminishes over time on Wikipedia by analyzing 13 years of edit history, revealing that personal traits can be inferred with increasing accuracy even from broad activity features.
Contribution
It demonstrates that privacy loss accumulates over time on Wikipedia, with machine learning predicting personal traits from general activity data, highlighting long-term privacy risks.
Findings
Prediction accuracy for private traits improves over time.
Even users who stop editing have increasing trait predictability.
New user activities contribute more to privacy loss than existing users.
Abstract
The cumulative effect of collective online participation has an important and adverse impact on individual privacy. As an online system evolves over time, new digital traces of individual behavior may uncover previously hidden statistical links between an individual's past actions and her private traits. To quantify this effect, we analyze the evolution of individual privacy loss by studying the edit history of Wikipedia over 13 years, including more than 117,523 different users performing 188,805,088 edits. We trace each Wikipedia's contributor using apparently harmless features, such as the number of edits performed on predefined broad categories in a given time period (e.g. Mathematics, Culture or Nature). We show that even at this unspecific level of behavior description, it is possible to use off-the-shelf machine learning algorithms to uncover usually undisclosed personal traits,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
