Cascade Prompt Learning for Vision-Language Model Adaptation

Ge Wu; Xin Zhang; Zheng Li; Zhaowei Chen; Jiajun Liang; Jian Yang and; Xiang Li

arXiv:2409.17805·cs.CV·September 27, 2024

Cascade Prompt Learning for Vision-Language Model Adaptation

Ge Wu, Xin Zhang, Zheng Li, Zhaowei Chen, Jiajun Liang, Jian Yang and, Xiang Li

PDF

Open Access 2 Repos 1 Models

TL;DR

CasPL introduces a two-phase cascade prompt learning framework for vision-language models, improving adaptation to downstream tasks by capturing both domain-general and task-specific knowledge, reducing overfitting, and enhancing performance.

Contribution

The paper proposes a novel cascade prompt learning paradigm with two distinct prompt phases, enabling simultaneous extraction of domain-general and task-specific knowledge for better model adaptation.

Findings

01

CasPL outperforms previous methods like PromptSRC on multiple datasets.

02

It achieves a 1.85% to 3.44% improvement in classification accuracy.

03

CasPL maintains a good balance between performance and inference speed.

Abstract

Prompt learning has surfaced as an effective approach to enhance the performance of Vision-Language Models (VLMs) like CLIP when applied to downstream tasks. However, current learnable prompt tokens are primarily used for the single phase of adapting to tasks (i.e., adapting prompt), easily leading to overfitting risks. In this work, we propose a novel Cascade Prompt Learning CasPL framework to enable prompt learning to serve both generic and specific expertise (i.e., boosting and adapting prompt) simultaneously. Specifically, CasPL is a new learning paradigm comprising two distinct phases of learnable prompts: the first boosting prompt is crafted to extract domain-general knowledge from a senior larger CLIP teacher model by aligning their predicted logits using extensive unlabeled domain images. The second adapting prompt is then cascaded with the frozen first set to fine-tune the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
zhengli97/prompt_learning_dataset
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI

MethodsSparse Evolutionary Training · Balanced Selection · Contrastive Language-Image Pre-training