StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
Yonglong Tian, Lijie Fan, Phillip Isola, Huiwen Chang, Dilip Krishnan

TL;DR
This paper demonstrates that synthetic images generated by text-to-image models, when used with proper training techniques, can produce visual representations that outperform those learned from real images, especially when combined with language supervision.
Contribution
The paper introduces StableRep, a novel contrastive learning method that leverages synthetic images from text-to-image models for superior visual representation learning.
Findings
Synthetic images can match or outperform real images in training visual representations.
StableRep surpasses SimCLR and CLIP using only synthetic images.
With language supervision, StableRep outperforms CLIP trained on more real images.
Abstract
We investigate the potential of learning visual representations using synthetic images generated by text-to-image models. This is a natural question in the light of the excellent performance of such models in generating high-quality images. We consider specifically the Stable Diffusion, one of the leading open source text-to-image models. We show that (1) when the generative model is configured with proper classifier-free guidance scale, training self-supervised methods on synthetic images can match or beat the real image counterpart; (2) by treating the multiple images generated from the same text prompt as positives for each other, we develop a multi-positive contrastive learning method, which we call StableRep. With solely synthetic images, the representations learned by StableRep surpass the performance of representations learned by SimCLR and CLIP using the same set of text prompts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques
MethodsBitcoin Customer Service Number +1-833-534-1729 · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Block · Residual Connection · Convolution · Batch Normalization · 1x1 Convolution · Max Pooling · Average Pooling · Bottleneck Residual Block
