Seeded Database Matching Under Noisy Column Repetitions
Serhat Bakirtas, Elza Erkip

TL;DR
This paper develops a unified framework for database matching that accounts for obfuscation and synchronization errors, providing theoretical conditions for successful user re-identification in noisy, anonymized, time-indexed data.
Contribution
It introduces replica detection and seeded deletion detection algorithms, deriving necessary and sufficient conditions for database matching under noisy column repetitions using information-theoretic methods.
Findings
A seed size logarithmic in row size suffices for detecting all deleted columns.
The derived conditions are both necessary and sufficient for successful matching.
Insights into privacy-preserving data publication are provided.
Abstract
The re-identification or de-anonymization of users from anonymized data through matching with publicly-available correlated user data has raised privacy concerns, leading to the complementary measure of obfuscation in addition to anonymization. Recent research provides a fundamental understanding of the conditions under which privacy attacks are successful, either in the presence of obfuscation or synchronization errors stemming from the sampling of time-indexed databases. This paper presents a unified framework considering both obfuscation and synchronization errors and investigates the matching of databases under noisy column repetitions. By devising replica detection and seeded deletion detection algorithms, and using information-theoretic tools, sufficient conditions for successful matching are derived. It is shown that a seed size logarithmic in the row size is enough to guarantee…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Cryptography and Data Security
