Discriminative Class Tokens for Text-to-Image Diffusion Models
Idan Schwartz, V\'esteinn Sn{\ae}bjarnarson, Hila Chefer, Ryan, Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim

TL;DR
This paper introduces a fast, non-invasive fine-tuning method for text-to-image diffusion models that enhances image accuracy and quality by using discriminative signals from a pretrained classifier, without needing extensive retraining.
Contribution
The authors propose a novel technique to improve diffusion models by iteratively modifying input token embeddings guided by a pretrained classifier, enabling better image accuracy and control.
Findings
Generated images are more accurate and higher quality.
Method can augment training data in low-resource settings.
Reveals information about training data used for the classifier.
Abstract
Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. While impressive, the images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in the input text. One way of alleviating these issues is to train diffusion models on class-labeled datasets. This approach has two disadvantages: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, affecting the quality and diversity of the generated images, or (ii) the input is a hard-coded label, as opposed to free-form text, limiting the control over the generated images. In this work, we propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text while achieving high accuracy through discriminative signals from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Discriminative Class Tokens for Text-to-Image Diffusion Models· youtube
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsDiffusion
