One Step to Efficient Synthetic Data
Jordan Awan, Zhanrui Cai

TL;DR
This paper introduces a novel, computationally efficient method for generating synthetic data that ensures asymptotic efficiency, consistency with the true distribution, and strong privacy guarantees, improving over traditional sampling approaches.
Contribution
The authors propose a general, widely applicable synthetic data generation method that achieves asymptotic efficiency, consistency, and differential privacy, with theoretical and empirical validation.
Findings
The new method produces asymptotically efficient and consistent synthetic data.
It can generate both partially and fully private synthetic datasets.
The approach is computationally efficient and adaptable to various parametric models.
Abstract
A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data, which is widely applicable for parametric models, has asymptotically efficient summary statistics, and is both easily implemented and highly computationally efficient. Our approach allows for the construction of both partially synthetic datasets, which preserve certain summary statistics, as well as fully synthetic data which satisfy the strong guarantee of differential privacy (DP), both with the same asymptotic guarantees. We also provide theoretical and empirical evidence that the distribution from our procedure converges to the true distribution. Besides our focus on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Probability and Risk Models · Statistical Methods and Bayesian Inference
