Database Matching Under Column Deletions
Serhat Bakirtas, Elza Erkip

TL;DR
This paper studies the problem of matching databases with random column deletions using information theory tools, providing conditions for successful matching and a deletion detection algorithm with probabilistic guarantees.
Contribution
It introduces a theoretical framework for database matching under column deletions and proposes a deletion detection algorithm with bounds on its performance.
Findings
Partial deletion information greatly improves matching success.
A batch size growing double-logarithmic with row size suffices for detection.
Conditions for successful matching are derived using information theory.
Abstract
De-anonymizing user identities by matching various forms of user data available on the internet raises privacy concerns. A fundamental understanding of the privacy leakage in such scenarios requires a careful study of conditions under which correlated databases can be matched. Motivated by synchronization errors in time indexed databases, in this work, matching of random databases under random column deletion is investigated. Adapting tools from information theory, in particular ones developed for the deletion channel, conditions for database matching in the absence and presence of deletion location information are derived, showing that partial deletion information significantly increases the achievable database growth rate for successful matching. Furthermore, given a batch of correctly-matched rows, a deletion detection algorithm that provides partial deletion information is proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
