Distribution-Agnostic Database De-Anonymization Under Synchronization Errors
Serhat Bakirtas, Elza Erkip

TL;DR
This paper develops a theoretical framework for database de-anonymization that works without prior knowledge of data distribution, handling synchronization errors and noise, and achieves performance comparable to distribution-aware methods.
Contribution
It introduces a distribution-agnostic approach with theoretical guarantees for de-anonymization, matching the performance of distribution-aware techniques.
Findings
Double-logarithmic seed size suffices for successful detection
Distribution-agnostic method matches distribution-aware performance
Theoretical guarantees are established for de-anonymization without distribution knowledge
Abstract
There has recently been an increased scientific interest in the de-anonymization of users in anonymized databases containing user-level microdata via multifarious matching strategies utilizing publicly available correlated data. Existing literature has either emphasized practical aspects where underlying data distribution is not required, with limited or no theoretical guarantees, or theoretical aspects with the assumption of complete availability of underlying distributions. In this work, we take a step towards reconciling these two lines of work by providing theoretical guarantees for the de-anonymization of random correlated databases without prior knowledge of data distribution. Motivated by time-indexed microdata, we consider database de-anonymization under both synchronization errors (column repetitions) and obfuscation (noise). By modifying the previously used replica detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Internet Traffic Analysis and Secure E-voting
