SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation
Yunsung Chung, Yunbei Zhang, Nassir Marrouche, Jihun Hamm

TL;DR
This survey evaluates whether synthetic images can replace real data by systematically analyzing various generation methods, privacy risks, and utility-privacy tradeoffs through empirical benchmarking and attack assessments.
Contribution
It provides a comprehensive categorization of synthetic image generation techniques, benchmarks multiple methods, and assesses privacy risks using membership inference attacks.
Findings
Synthetic images can sometimes effectively replace real data depending on the method.
Privacy mitigations can improve privacy but may reduce data utility.
Certain generative models outperform others in balancing utility and privacy.
Abstract
Advances in generative models have transformed the field of synthetic image generation for privacy-preserving data synthesis (PPDS). However, the field lacks a comprehensive survey and comparison of synthetic image generation methods across diverse settings. In particular, when we generate synthetic images for the purpose of training a classifier, there is a pipeline of generation-sampling-classification which takes private training as input and outputs the final classifier of interest. In this survey, we systematically categorize existing image synthesis methods, privacy attacks, and mitigations along this generation-sampling-classification pipeline. To empirically compare diverse synthesis approaches, we provide a benchmark with representative generative methods and use model-agnostic membership inference attacks (MIAs) as a measure of privacy risk. Through this study, we seek to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
