TL;DR
This paper investigates how the average degree of social networks influences re-identification privacy risks and proposes a sampling-based method to mitigate these risks during data collection.
Contribution
It identifies the average degree as a key factor in re-identification risk and introduces a simple, sampling-based anonymization method applicable at data collection.
Findings
Dense networks have higher re-identification risks.
High risk when average degree exceeds 10.
Sampling method effectively reduces privacy risks.
Abstract
The ability to share social network data at the level of individual connections is beneficial to science: not only for reproducing results, but also for researchers who may wish to use it for purposes not foreseen by the data releaser. Sharing such data, however, can lead to serious privacy issues, because individuals could be re-identified, not only based on possible nodes' attributes, but also from the structure of the network around them. The risk associated with re-identification can be measured and it is more serious in some networks than in others. Various optimization algorithms have been proposed to anonymize the network while keeping the number of changes minimal. However, existing algorithms do not provide guarantees on where the changes will be made, making it difficult to quantify their effect on various measures. Using network models and real data, we show that the average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
