Matching of Markov Databases Under Random Column Repetitions
Serhat Bakirtas, Elza Erkip

TL;DR
This paper investigates the problem of matching correlated Markov databases with random column repetitions and deletions, proposing permutation-invariant features and deriving conditions for successful matching, with implications for privacy-preserving data publication.
Contribution
It introduces a novel approach using column histograms for permutation-invariant detection and establishes the matching capacity under random column repetitions using information-theoretic analysis.
Findings
Asymptotic-uniqueness of column histograms proved.
Sufficient conditions for successful database matching derived.
Matching capacity equals the erasure bound when repetition locations are known.
Abstract
Matching entries of correlated shuffled databases have practical applications ranging from privacy to biology. In this paper, motivated by synchronization errors in the sampling of time-indexed databases, matching of random databases under random column repetitions and deletions is investigated. It is assumed that for each entry (row) in the database, the attributes (columns) are correlated, which is modeled as a Markov process. Column histograms are proposed as a permutation-invariant feature to detect the repetition pattern, whose asymptotic-uniqueness is proved using information-theoretic tools. Repetition detection is then followed by a typicality-based row matching scheme. Considering this overall scheme, sufficient conditions for successful matching of databases in terms of the database growth rate are derived. A modified version of Fano's inequality leads to a tight necessary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic processes and statistical mechanics · Cryptography and Data Security
