TL;DR
This paper evaluates and compares various differentially private synthetic data algorithms from a NIST challenge, providing insights into their accuracy, usability, and implications for policy and practice.
Contribution
It offers a comprehensive evaluation of multiple algorithms using real challenge data, introduces new utility metrics, and discusses their suitability for policy use.
Findings
Certain algorithms outperform others in data utility.
New utility metrics provide better assessment of synthetic data quality.
Recommendations for algorithm selection based on use-case.
Abstract
Differentially private synthetic data generation offers a recent solution to release analytically useful data while preserving the privacy of individuals in the data. In order to utilize these algorithms for public policy decisions, policymakers need an accurate understanding of these algorithms' comparative performance. Correspondingly, data practitioners require standard metrics for evaluating the analytic qualities of the synthetic data. In this paper, we present an in-depth evaluation of several differentially private synthetic data algorithms using actual differentially private synthetic data sets created by contestants in the 2018-2019 National Institute of Standards and Technology Public Safety Communications Research (NIST PSCR) Division's ``Differential Privacy Synthetic Data Challenge.'' We offer analyses of these algorithms based on both the accuracy of the data they created…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
