Discriminative Class Tokens for Text-to-Image Diffusion Models

Idan Schwartz; V\'esteinn Sn{\ae}bjarnarson; Hila Chefer; Ryan; Cotterell; Serge Belongie; Lior Wolf; Sagie Benaim

arXiv:2303.17155·cs.CV·January 13, 2025·1 cites

Discriminative Class Tokens for Text-to-Image Diffusion Models

Idan Schwartz, V\'esteinn Sn{\ae}bjarnarson, Hila Chefer, Ryan, Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a fast, non-invasive fine-tuning method for text-to-image diffusion models that enhances image accuracy and quality by using discriminative signals from a pretrained classifier, without needing extensive retraining.

Contribution

The authors propose a novel technique to improve diffusion models by iteratively modifying input token embeddings guided by a pretrained classifier, enabling better image accuracy and control.

Findings

01

Generated images are more accurate and higher quality.

02

Method can augment training data in low-resource settings.

03

Reveals information about training data used for the classifier.

Abstract

Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. While impressive, the images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in the input text. One way of alleviating these issues is to train diffusion models on class-labeled datasets. This approach has two disadvantages: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, affecting the quality and diversity of the generated images, or (ii) the input is a hard-coded label, as opposed to free-form text, limiting the control over the generated images. In this work, we propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text while achieving high accuracy through discriminative signals from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idansc/discriminative_class_tokens
pytorchOfficial

Videos

Discriminative Class Tokens for Text-to-Image Diffusion Models· youtube

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsDiffusion