From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition
Maan Qraitem, Kate Saenko, Bryan A. Plummer

TL;DR
This paper introduces a two-step training method called From Fake to Real (FFR) that uses balanced synthetic data for pretraining and real data for fine-tuning to reduce spurious correlations in image recognition models.
Contribution
The proposed FFR approach effectively mitigates bias by separating training on synthetic and real data, improving worst group accuracy significantly.
Findings
FFR improves worst group accuracy by up to 20%.
It outperforms state-of-the-art bias mitigation methods.
Training on synthetic data first enhances model robustness.
Abstract
Visual recognition models are prone to learning spurious correlations induced by a biased training set where certain conditions (\eg, Indoors) are over-represented in certain classes (\eg, Big Dogs). Synthetic data from off-the-shelf large-scale generative models offers a promising direction to mitigate this issue by augmenting underrepresented subgroups in the real dataset. However, by using a mixed distribution of real and synthetic data, we introduce another source of bias due to distributional differences between synthetic and real data (\eg synthetic artifacts). As we will show, prior work's approach for using synthetic data to resolve the model's bias toward do not correct the model's bias toward the pair , where denotes whether the sample is real or synthetic. Thus, the model could simply learn signals based on the pair (\eg, Synthetic Indoors) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
Methodsfail
