How To Break Anonymity of the Netflix Prize Dataset
Arvind Narayanan, Vitaly Shmatikov

TL;DR
This paper introduces robust statistical de-anonymization techniques that can re-identify individuals in high-dimensional micro-data, demonstrated on the Netflix Prize dataset, revealing sensitive personal information with minimal background knowledge.
Contribution
The paper presents a new class of de-anonymization attacks that are resilient to data perturbations and limited background knowledge, applied successfully to real-world high-dimensional data.
Findings
De-anonymization is feasible with minimal background knowledge.
The techniques can identify individuals in the Netflix dataset.
Sensitive information about users can be uncovered.
Abstract
We present a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge. We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Privacy, Security, and Data Protection
