Private Synthetic Data Generation in Bounded Memory
Rayne Holland, Seyit Camtepe, Chandra Thapa, Minhui Xue

TL;DR
PrivHP is a lightweight, differentially private synthetic data generator that uses hierarchical decomposition and pruning to efficiently approximate data distributions within bounded memory, balancing privacy, utility, and space.
Contribution
It introduces PrivHP, a novel hierarchical, differentially private synthetic data generator with a tunable trade-off between space and utility, using pruning and private sketches.
Findings
Achieves differential privacy with space complexity O(k log^2 |X|).
Provides utility bounds considering hierarchy, noise, and pruning.
Ensures expected Wasserstein distance bounds from empirical distribution.
Abstract
We propose , a lightweight synthetic data generator with \textit{differential privacy} guarantees. uses a novel hierarchical decomposition that approximates the input's cumulative distribution function (CDF) in bounded memory. It balances hierarchy depth, noise addition, and pruning of low-frequency subdomains while preserving frequent ones. Private sketches estimate subdomain frequencies efficiently without full data access. A key feature is the pruning parameter , which controls the trade-off between space and utility. We define the skew measure , capturing all but the top subdomain frequencies. Given a dataset , uses space and, for input domain , ensures -differential privacy. It yields a generator with expected Wasserstein distance: \[…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Advanced Data Storage Technologies · Privacy-Preserving Technologies in Data
