Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images
Krishnakant Singh, Thanush Navaratnam, Jannik Holmer, Simone, Schaub-Meyer, Stefan Roth

TL;DR
This paper benchmarks the robustness of models trained with synthetic images across various metrics, revealing strengths in shape and background bias but vulnerabilities to noise, and highlights the benefits of combining real and synthetic data.
Contribution
It provides the first comprehensive benchmark of synthetic clone models' robustness, comparing supervised, self-supervised, and multi-modal approaches across multiple metrics.
Findings
Synthetic clones match or outperform real-image models in some robustness metrics.
Synthetic clones are more vulnerable to adversarial and real-world noise.
Combining real and synthetic data enhances model robustness.
Abstract
A long-standing challenge in developing machine learning approaches has been the lack of high-quality labeled data. Recently, models trained with purely synthetic data, here termed synthetic clones, generated using large-scale pre-trained diffusion models have shown promising results in overcoming this annotation bottleneck. As these synthetic clone models progress, they are likely to be deployed in challenging real-world settings, yet their suitability remains understudied. Our work addresses this gap by providing the first benchmark for three classes of synthetic clone models, namely supervised, self-supervised, and multi-modal ones, across a range of robustness measures. We show that existing synthetic self-supervised and multi-modal clones are comparable to or outperform state-of-the-art real-image baselines for a range of robustness metrics - shape bias, background bias,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications · 3D Surveying and Cultural Heritage · AI in cancer detection
MethodsDiffusion
