Towards Better Bounds for Finding Quasi-Identifiers

Ryan Hildebrant; Quoc-Tung Le; Duy-Hoang Ta; Hoa T. Vu

arXiv:2211.13882·cs.DS·April 14, 2023

Towards Better Bounds for Finding Quasi-Identifiers

Ryan Hildebrant, Quoc-Tung Le, Duy-Hoang Ta, Hoa T. Vu

PDF

Open Access 1 Repo

TL;DR

This paper improves bounds and algorithms for identifying small subsets of data coordinates that separate most pairs of tuples, enhancing efficiency in privacy-preserving data analysis.

Contribution

It introduces a tighter sample size bound of a(m/\u221a{psilon}) for separation keys and provides new upper and lower bounds on sampling requirements.

Findings

01

Sample size can be reduced to a(m/psilon) for effective separation.

02

Established tight bounds on sampling for decision algorithms with high probability.

03

Analyzed a sketching algorithm's space complexity, showing lower bounds and proposing efficient sampling methods.

Abstract

We revisit the problem of finding small $ϵ$ -separation keys introduced by Motwani and Xu (2008). In this problem, the input is $m$ -dimensional tuples $x_{1}, x_{2}, \dots, x_{n}$ . The goal is to find a small subset of coordinates that separates at least $(1 - ϵ) (2 n)$ pairs of tuples. They provided a fast algorithm that runs on $Θ (m / ϵ)$ tuples sampled uniformly at random. We show that the sample size can be improved to $Θ (m / ϵ)$ . Our algorithm also enjoys a faster running time. To obtain this result, we provide upper and lower bounds on the sample size to solve the following decision problem. Given a subset of coordinates $A$ , reject if $A$ separates fewer than $(1 - ϵ) (2 n)$ pairs, and accept if $A$ separates all pairs. The algorithm must be correct with probability at least $1 - δ$ for all $A$ . We show that for algorithms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ryanhilde/min_set_cover
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · DNA and Biological Computing