Identifying Participants in the Personal Genome Project by Name (A Re-identification Experiment)
Latanya Sweeney, Akua Abu, Julia Winn

TL;DR
This study demonstrates a high success rate in re-identifying individuals in the Personal Genome Project by linking demographic data to public records, highlighting privacy vulnerabilities and proposing mitigation strategies.
Contribution
It reveals a significant re-identification risk in genomic datasets through demographic linkage and suggests practical methods to enhance privacy protection.
Findings
84-97% re-identification success rate
Demographics alone enable re-identification
Proposed remedies to improve privacy
Abstract
We linked names and contact information to publicly available profiles in the Personal Genome Project. These profiles contain medical and genomic information, including details about medications, procedures and diseases, and demographic information, such as date of birth, gender, and postal code. By linking demographics to public records such as voter lists, and mining for names hidden in attached documents, we correctly identified 84 to 97 percent of the profiles for which we provided names. Our ability to learn their names is based on their demographics, not their DNA, thereby revisiting an old vulnerability that could be easily thwarted with minimal loss of research value. So, we propose technical remedies for people to learn about their demographics to make better decisions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsClimate Change Communication and Perception
