Advancing Prompt Learning through an External Layer
Fangming Cui, Xun Yang, Chao Wu, Liang Xiao, Xinmei Tian

TL;DR
This paper introduces EnPrompt with an External Layer (EnLa), a novel approach that enhances prompt learning for vision-language models by incorporating learnable external layers and optimal transport for better generalization across tasks.
Contribution
The paper proposes a new paradigm called EnPrompt with an External Layer (EnLa), introducing learnable external layers and a novel alignment method to improve prompt learning for VLMs.
Findings
Outperforms existing prompt learning methods on multiple benchmarks.
Effective in base-to-novel generalization and few-shot learning.
Enhances cross-dataset and domain shift generalization.
Abstract
Prompt learning represents a promising method for adapting pre-trained vision-language models (VLMs) to various downstream tasks by learning a set of text embeddings. One challenge inherent to these methods is the poor generalization performance due to the invalidity of the learned text embeddings for unseen tasks. A straightforward approach to bridge this gap is to freeze the text embeddings in prompts, which results in a lack of capacity to adapt VLMs for downstream tasks. To address this dilemma, we propose a paradigm called EnPrompt with a novel External Layer (EnLa). Specifically, we propose a textual external layer and learnable visual embeddings for adapting VLMs to downstream tasks. The learnable external layer is built upon valid embeddings of pre-trained CLIP. This design considers the balance of learning capabilities between the two branches. To align the textual and visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training · ALIGN
