Discffusion: Discriminative Diffusion Models as Few-shot Vision and   Language Learners

Xuehai He; Weixi Feng; Tsu-Jui Fu; Varun Jampani; Arjun Akula,; Pradyumna Narayana; Sugato Basu; William Yang Wang; Xin Eric Wang

arXiv:2305.10722·cs.CV·April 26, 2024·1 cites

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula,, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Discriminative Stable Diffusion (DSD), a method that repurposes pre-trained diffusion models for few-shot image-text matching by leveraging cross-attention scores and prompt learning, achieving superior results.

Contribution

The paper presents a novel approach to convert pre-trained diffusion models into discriminative learners for image-text matching, a task not traditionally associated with diffusion models.

Findings

01

DSD outperforms state-of-the-art methods on benchmark datasets.

02

Cross-attention scores effectively capture visual-textual mutual influence.

03

Efficient prompt learning enables few-shot discriminative performance.

Abstract

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text prompts, can we leverage the powerful representations learned by pre-trained diffusion models for discriminative tasks such as image-text matching? To answer this question, we propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners. Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information and fine-tune the model via efficient attention-based prompt learning to perform image-text matching. By comparing DSD with state-of-the-art methods on several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eric-ai-lab/dsd
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling

MethodsDiffusion