Partition-based differentially private synthetic data generation
Meifan Zhang, Dihang Deng, Lihua Yin

TL;DR
This paper introduces a partition-based method for generating differentially private synthetic data that reduces errors and enhances data quality, outperforming existing approaches especially under limited privacy budgets.
Contribution
The proposed approach effectively reduces errors and improves synthetic data quality using partitioning, addressing limitations of current select-measure-generate methods.
Findings
Outperforms existing methods in data quality and utility
Reduces errors in large domain marginals
Maintains high data utility with limited privacy budget
Abstract
Private synthetic data sharing is preferred as it keeps the distribution and nuances of original data compared to summary statistics. The state-of-the-art methods adopt a select-measure-generate paradigm, but measuring large domain marginals still results in much error and allocating privacy budget iteratively is still difficult. To address these issues, our method employs a partition-based approach that effectively reduces errors and improves the quality of synthetic data, even with a limited privacy budget. Results from our experiments demonstrate the superiority of our method over existing approaches. The synthetic data produced using our approach exhibits improved quality and utility, making it a preferable choice for private synthetic data sharing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Traffic Prediction and Management Techniques
