SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation

Yunsung Chung; Yunbei Zhang; Nassir Marrouche; Jihun Hamm

arXiv:2506.19360·cs.CR·June 27, 2025

SoK: Can Synthetic Images Replace Real Data? A Survey of Utility and Privacy of Synthetic Image Generation

Yunsung Chung, Yunbei Zhang, Nassir Marrouche, Jihun Hamm

PDF

TL;DR

This survey evaluates whether synthetic images can replace real data by systematically analyzing various generation methods, privacy risks, and utility-privacy tradeoffs through empirical benchmarking and attack assessments.

Contribution

It provides a comprehensive categorization of synthetic image generation techniques, benchmarks multiple methods, and assesses privacy risks using membership inference attacks.

Findings

01

Synthetic images can sometimes effectively replace real data depending on the method.

02

Privacy mitigations can improve privacy but may reduce data utility.

03

Certain generative models outperform others in balancing utility and privacy.

Abstract

Advances in generative models have transformed the field of synthetic image generation for privacy-preserving data synthesis (PPDS). However, the field lacks a comprehensive survey and comparison of synthetic image generation methods across diverse settings. In particular, when we generate synthetic images for the purpose of training a classifier, there is a pipeline of generation-sampling-classification which takes private training as input and outputs the final classifier of interest. In this survey, we systematically categorize existing image synthesis methods, privacy attacks, and mitigations along this generation-sampling-classification pipeline. To empirically compare diverse synthesis approaches, we provide a benchmark with representative generative methods and use model-agnostic membership inference attacks (MIAs) as a measure of privacy risk. Through this study, we seek to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.