Differentially Private Synthetic Data: Applied Evaluations and Enhancements
Lucas Rosenblatt, Xiaoyan Liu, Samira Pouyanfar, Eduardo de Leon, Anuj, Desai, Joshua Allen

TL;DR
This paper evaluates four differentially private generative adversarial networks for synthetic data creation, benchmarking their performance on tabular datasets and industry scenarios, and introduces QUAIL, an ensemble approach that can outperform baseline models under certain conditions.
Contribution
It provides a comprehensive evaluation of DP-GANs on real datasets, introduces the QUAIL ensemble method, and discusses tradeoffs in privacy-utility balance for synthetic data generation.
Findings
Some synthesizers perform better at specific privacy budgets.
QUAIL can outperform baseline models in certain scenarios.
Tradeoffs exist between privacy levels and data utility.
Abstract
Machine learning practitioners frequently seek to leverage the most informative available data, without violating the data owner's privacy, when building predictive models. Differentially private data synthesis protects personal details from exposure, and allows for the training of differentially private machine learning models on privately generated datasets. But how can we effectively assess the efficacy of differentially private synthetic data? In this paper, we survey four differentially private generative adversarial networks for data synthesis. We evaluate each of them at scale on five standard tabular datasets, and in two applied industry scenarios. We benchmark with novel metrics from recent literature and other standard machine learning tools. Our results suggest some synthesizers are more applicable for different privacy budgets, and we further demonstrate complicating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques
