From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent   Spurious Correlations in Image Recognition

Maan Qraitem; Kate Saenko; Bryan A. Plummer

arXiv:2308.04553·cs.CV·July 18, 2024·1 cites

From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition

Maan Qraitem, Kate Saenko, Bryan A. Plummer

PDF

Open Access 1 Repo

TL;DR

This paper introduces a two-step training method called From Fake to Real (FFR) that uses balanced synthetic data for pretraining and real data for fine-tuning to reduce spurious correlations in image recognition models.

Contribution

The proposed FFR approach effectively mitigates bias by separating training on synthetic and real data, improving worst group accuracy significantly.

Findings

01

FFR improves worst group accuracy by up to 20%.

02

It outperforms state-of-the-art bias mitigation methods.

03

Training on synthetic data first enhances model robustness.

Abstract

Visual recognition models are prone to learning spurious correlations induced by a biased training set where certain conditions $B$ (\eg, Indoors) are over-represented in certain classes $Y$ (\eg, Big Dogs). Synthetic data from off-the-shelf large-scale generative models offers a promising direction to mitigate this issue by augmenting underrepresented subgroups in the real dataset. However, by using a mixed distribution of real and synthetic data, we introduce another source of bias due to distributional differences between synthetic and real data (\eg synthetic artifacts). As we will show, prior work's approach for using synthetic data to resolve the model's bias toward $B$ do not correct the model's bias toward the pair $(B, G)$ , where $G$ denotes whether the sample is real or synthetic. Thus, the model could simply learn signals based on the pair $(B, G)$ (\eg, Synthetic Indoors) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mqraitem/from-fake-to-real
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

Methodsfail