Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images
Zhuoran Yu, Chenchen Zhu, Sean Culatana, Raghuraman Krishnamoorthi,, Fanyi Xiao, Yong Jae Lee

TL;DR
This paper demonstrates that scaling up training with diverse synthetic images, generated using off-the-shelf models and advanced diversification techniques, can significantly improve visual recognition performance without fine-tuning generative models.
Contribution
The authors introduce a new framework that leverages large language models and domain adaptation to generate diverse synthetic images, avoiding fine-tuning of generative models for improved recognition.
Findings
Synthetic data up to 6x the size of ImageNet improves recognition.
Diversity and domain adaptation techniques enhance synthetic data effectiveness.
Performance gains include strong out-of-domain generalization.
Abstract
Recent advances in generative deep learning have enabled the creation of high-quality synthetic images in text-to-image generation. Prior work shows that fine-tuning a pretrained diffusion model on ImageNet and generating synthetic training images from the finetuned model can enhance an ImageNet classifier's performance. However, performance degrades as synthetic images outnumber real ones. In this paper, we explore whether generative fine-tuning is essential for this improvement and whether it is possible to further scale up training using more synthetic data. We present a new framework leveraging off-the-shelf generative models to generate synthetic training images, addressing multiple challenges: class name ambiguity, lack of diversity in naive prompts, and domain shifts. Specifically, we leverage large language models (LLMs) and CLIP to resolve class name ambiguity. To diversify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
MethodsContrastive Language-Image Pre-training · Diffusion · Batch Normalization · Auxiliary Batch Normalization
