On the Equivalency, Substitutability, and Flexibility of Synthetic Data
Che-Jui Chang, Danrui Li, Seonghyeon Moon, Mubbasir Kapadia

TL;DR
This paper empirically evaluates synthetic data's effectiveness in training perception models, demonstrating its potential to replace significant portions of real data and highlighting the importance of flexible data generation for domain adaptation.
Contribution
It systematically investigates synthetic data's equivalency, substitutability, and flexibility, providing empirical evidence on their impact on model performance and domain gap reduction.
Findings
Synthetic data can replace 60-80% of real data without performance loss.
Synthetic data enhances model performance and domain adaptability.
Flexible data generators are crucial for closing domain gaps.
Abstract
We study, from an empirical standpoint, the efficacy of synthetic data in real-world scenarios. Leveraging synthetic data for training perception models has become a key strategy embraced by the community due to its efficiency, scalability, perfect annotations, and low costs. Despite proven advantages, few studies put their stress on how to efficiently generate synthetic datasets to solve real-world problems and to what extent synthetic data can reduce the effort for real-world data collection. To answer the questions, we systematically investigate several interesting properties of synthetic data -- the equivalency of synthetic data to real-world data, the substitutability of synthetic data for real data, and the flexibility of synthetic data generators to close up domain gaps. Leveraging the M3Act synthetic data generator, we conduct experiments on DanceTrack and MOT17. Our results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries
