StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual   Representation Learners

Yonglong Tian; Lijie Fan; Phillip Isola; Huiwen Chang; Dilip Krishnan

arXiv:2306.00984·cs.CV·October 27, 2023·23 cites

StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

Yonglong Tian, Lijie Fan, Phillip Isola, Huiwen Chang, Dilip Krishnan

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper demonstrates that synthetic images generated by text-to-image models, when used with proper training techniques, can produce visual representations that outperform those learned from real images, especially when combined with language supervision.

Contribution

The paper introduces StableRep, a novel contrastive learning method that leverages synthetic images from text-to-image models for superior visual representation learning.

Findings

01

Synthetic images can match or outperform real images in training visual representations.

02

StableRep surpasses SimCLR and CLIP using only synthetic images.

03

With language supervision, StableRep outperforms CLIP trained on more real images.

Abstract

We investigate the potential of learning visual representations using synthetic images generated by text-to-image models. This is a natural question in the light of the excellent performance of such models in generating high-quality images. We consider specifically the Stable Diffusion, one of the leading open source text-to-image models. We show that (1) when the generative model is configured with proper classifier-free guidance scale, training self-supervised methods on synthetic images can match or beat the real image counterpart; (2) by treating the multiple images generated from the same text prompt as positives for each other, we develop a multi-positive contrastive learning method, which we call StableRep. With solely synthetic images, the representations learned by StableRep surpass the performance of representations learned by SimCLR and CLIP using the same set of text prompts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques

MethodsBitcoin Customer Service Number +1-833-534-1729 · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Block · Residual Connection · Convolution · Batch Normalization · 1x1 Convolution · Max Pooling · Average Pooling · Bottleneck Residual Block