Synthetic Data: Revisiting the Privacy-Utility Trade-off
Fatima Jahan Sarmin, Atiquer Rahman Sarkar, Yang Wang, Noman Mohammed

TL;DR
This paper critically examines claims that synthetic data does not improve privacy-utility trade-offs over traditional methods, and finds that previous refutations were based on constrained environments, reaffirming synthetic data's advantages.
Contribution
The study validates the privacy-utility benefits of synthetic data in general settings and clarifies misconceptions caused by prior specialized analyses.
Findings
Synthetic data offers better privacy-utility trade-offs than k-anonymization.
Previous claims of privacy breaches in PATE-GAN and PrivBayes were due to constrained experimental setups.
In general environments, synthetic data maintains differential privacy guarantees and utility benefits.
Abstract
Synthetic data has been considered a better privacy-preserving alternative to traditionally sanitized data across various applications. However, a recent article challenges this notion, stating that synthetic data does not provide a better trade-off between privacy and utility than traditional anonymization techniques, and that it leads to unpredictable utility loss and highly unpredictable privacy gain. The article also claims to have identified a breach in the differential privacy guarantees provided by PATE-GAN and PrivBayes. When a study claims to refute or invalidate prior findings, it is crucial to verify and validate the study. In our work, we analyzed the implementation of the privacy game described in the article and found that it operated in a highly specialized and constrained environment, which limits the applicability of its findings to general cases. Our exploration also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
