DPL: Decoupled Prompt Learning for Vision-Language Models

Chen Xu; Yuhan Zhu; Guozhen Zhang; Haocheng Shen; Yixuan Liao; Xiaoxin; Chen; Gangshan Wu; Limin Wang

arXiv:2308.10061·cs.CV·August 22, 2023

DPL: Decoupled Prompt Learning for Vision-Language Models

Chen Xu, Yuhan Zhu, Guozhen Zhang, Haocheng Shen, Yixuan Liao, Xiaoxin, Chen, Gangshan Wu, Limin Wang

PDF

Open Access

TL;DR

This paper introduces Decoupled Prompt Learning (DPL), a novel method that reformulates attention in prompt learning to improve generalization to unseen classes in vision-language models, achieving state-of-the-art results without extra data.

Contribution

The paper proposes a decoupled attention mechanism in prompt learning that enhances robustness and generalization, along with language-conditioned textual prompting for multi-modal applications.

Findings

01

Achieves state-of-the-art performance on 15 image recognition datasets.

02

Does not require auxiliary regularization or additional training data.

03

Enhances generalization to unseen classes in vision-language models.

Abstract

Prompt learning has emerged as an efficient and effective approach for transferring foundational Vision-Language Models (e.g., CLIP) to downstream tasks. However, current methods tend to overfit to seen categories, thereby limiting their generalization ability for unseen classes. In this paper, we propose a new method, Decoupled Prompt Learning (DPL), which reformulates the attention in prompt learning to alleviate this problem. Specifically, we theoretically investigate the collaborative process between prompts and instances (i.e., image patches/text tokens) by reformulating the original self-attention into four separate sub-processes. Through detailed analysis, we observe that certain sub-processes can be strengthened to bolster robustness and generalizability by some approximation techniques. Furthermore, we introduce language-conditioned textual prompting based on decoupled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques