Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners
Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula,, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

TL;DR
This paper introduces Discriminative Stable Diffusion (DSD), a method that repurposes pre-trained diffusion models for few-shot image-text matching by leveraging cross-attention scores and prompt learning, achieving superior results.
Contribution
The paper presents a novel approach to convert pre-trained diffusion models into discriminative learners for image-text matching, a task not traditionally associated with diffusion models.
Findings
DSD outperforms state-of-the-art methods on benchmark datasets.
Cross-attention scores effectively capture visual-textual mutual influence.
Efficient prompt learning enables few-shot discriminative performance.
Abstract
Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text prompts, can we leverage the powerful representations learned by pre-trained diffusion models for discriminative tasks such as image-text matching? To answer this question, we propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners. Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information and fine-tune the model via efficient attention-based prompt learning to perform image-text matching. By comparing DSD with state-of-the-art methods on several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsDiffusion
