Database Matching Under Column Deletions

Serhat Bakirtas; Elza Erkip

arXiv:2105.09616·cs.IT·May 21, 2021

Database Matching Under Column Deletions

Serhat Bakirtas, Elza Erkip

PDF

TL;DR

This paper studies the problem of matching databases with random column deletions using information theory tools, providing conditions for successful matching and a deletion detection algorithm with probabilistic guarantees.

Contribution

It introduces a theoretical framework for database matching under column deletions and proposes a deletion detection algorithm with bounds on its performance.

Findings

01

Partial deletion information greatly improves matching success.

02

A batch size growing double-logarithmic with row size suffices for detection.

03

Conditions for successful matching are derived using information theory.

Abstract

De-anonymizing user identities by matching various forms of user data available on the internet raises privacy concerns. A fundamental understanding of the privacy leakage in such scenarios requires a careful study of conditions under which correlated databases can be matched. Motivated by synchronization errors in time indexed databases, in this work, matching of random databases under random column deletion is investigated. Adapting tools from information theory, in particular ones developed for the deletion channel, conditions for database matching in the absence and presence of deletion location information are derived, showing that partial deletion information significantly increases the achievable database growth rate for successful matching. Furthermore, given a batch of correctly-matched rows, a deletion detection algorithm that provides partial deletion information is proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.