Towards Better Bounds for Finding Quasi-Identifiers
Ryan Hildebrant, Quoc-Tung Le, Duy-Hoang Ta, Hoa T. Vu

TL;DR
This paper improves bounds and algorithms for identifying small subsets of data coordinates that separate most pairs of tuples, enhancing efficiency in privacy-preserving data analysis.
Contribution
It introduces a tighter sample size bound of a(m/\u221a{psilon}) for separation keys and provides new upper and lower bounds on sampling requirements.
Findings
Sample size can be reduced to a(m/psilon) for effective separation.
Established tight bounds on sampling for decision algorithms with high probability.
Analyzed a sketching algorithm's space complexity, showing lower bounds and proposing efficient sampling methods.
Abstract
We revisit the problem of finding small -separation keys introduced by Motwani and Xu (2008). In this problem, the input is -dimensional tuples . The goal is to find a small subset of coordinates that separates at least pairs of tuples. They provided a fast algorithm that runs on tuples sampled uniformly at random. We show that the sample size can be improved to . Our algorithm also enjoys a faster running time. To obtain this result, we provide upper and lower bounds on the sample size to solve the following decision problem. Given a subset of coordinates , reject if separates fewer than pairs, and accept if separates all pairs. The algorithm must be correct with probability at least for all . We show that for algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · DNA and Biological Computing
