Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with   Synthetic Images

Zhuoran Yu; Chenchen Zhu; Sean Culatana; Raghuraman Krishnamoorthi,; Fanyi Xiao; Yong Jae Lee

arXiv:2312.02253·cs.CV·January 22, 2025·1 cites

Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images

Zhuoran Yu, Chenchen Zhu, Sean Culatana, Raghuraman Krishnamoorthi,, Fanyi Xiao, Yong Jae Lee

PDF

Open Access

TL;DR

This paper demonstrates that scaling up training with diverse synthetic images, generated using off-the-shelf models and advanced diversification techniques, can significantly improve visual recognition performance without fine-tuning generative models.

Contribution

The authors introduce a new framework that leverages large language models and domain adaptation to generate diverse synthetic images, avoiding fine-tuning of generative models for improved recognition.

Findings

01

Synthetic data up to 6x the size of ImageNet improves recognition.

02

Diversity and domain adaptation techniques enhance synthetic data effectiveness.

03

Performance gains include strong out-of-domain generalization.

Abstract

Recent advances in generative deep learning have enabled the creation of high-quality synthetic images in text-to-image generation. Prior work shows that fine-tuning a pretrained diffusion model on ImageNet and generating synthetic training images from the finetuned model can enhance an ImageNet classifier's performance. However, performance degrades as synthetic images outnumber real ones. In this paper, we explore whether generative fine-tuning is essential for this improvement and whether it is possible to further scale up training using more synthetic data. We present a new framework leveraging off-the-shelf generative models to generate synthetic training images, addressing multiple challenges: class name ambiguity, lack of diversity in naive prompts, and domain shifts. Specifically, we leverage large language models (LLMs) and CLIP to resolve class name ambiguity. To diversify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis

MethodsContrastive Language-Image Pre-training · Diffusion · Batch Normalization · Auxiliary Batch Normalization