Link Prediction by De-anonymization: How We Won the Kaggle Social Network Challenge
Arvind Narayanan, Elaine Shi, Benjamin I. P. Rubinstein

TL;DR
This paper details how de-anonymization techniques were used to win a social network link prediction challenge, highlighting vulnerabilities in privacy and proposing a combined approach for improved prediction accuracy.
Contribution
The paper introduces a novel de-anonymization attack for social network graphs and integrates it with link prediction methods to enhance performance in a competitive setting.
Findings
Successfully de-anonymized a significant portion of the test set
Combined de-anonymization with link prediction for better accuracy
Highlighted privacy vulnerabilities in social network data
Abstract
This paper describes the winning entry to the IJCNN 2011 Social Network Challenge run by Kaggle.com. The goal of the contest was to promote research on real-world link prediction, and the dataset was a graph obtained by crawling the popular Flickr social photo sharing website, with user identities scrubbed. By de-anonymizing much of the competition test set using our own Flickr crawl, we were able to effectively game the competition. Our attack represents a new application of de-anonymization to gaming machine learning contests, suggesting changes in how future competitions should be run. We introduce a new simulated annealing-based weighted graph matching algorithm for the seeding step of de-anonymization. We also show how to combine de-anonymization with link prediction---the latter is required to achieve good performance on the portion of the test set not de-anonymized---for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Data Quality and Management
