Advancing Prompt Learning through an External Layer

Fangming Cui; Xun Yang; Chao Wu; Liang Xiao; Xinmei Tian

arXiv:2407.19674·cs.CV·November 18, 2024

Advancing Prompt Learning through an External Layer

Fangming Cui, Xun Yang, Chao Wu, Liang Xiao, Xinmei Tian

PDF

TL;DR

This paper introduces EnPrompt with an External Layer (EnLa), a novel approach that enhances prompt learning for vision-language models by incorporating learnable external layers and optimal transport for better generalization across tasks.

Contribution

The paper proposes a new paradigm called EnPrompt with an External Layer (EnLa), introducing learnable external layers and a novel alignment method to improve prompt learning for VLMs.

Findings

01

Outperforms existing prompt learning methods on multiple benchmarks.

02

Effective in base-to-novel generalization and few-shot learning.

03

Enhances cross-dataset and domain shift generalization.

Abstract

Prompt learning represents a promising method for adapting pre-trained vision-language models (VLMs) to various downstream tasks by learning a set of text embeddings. One challenge inherent to these methods is the poor generalization performance due to the invalidity of the learned text embeddings for unseen tasks. A straightforward approach to bridge this gap is to freeze the text embeddings in prompts, which results in a lack of capacity to adapt VLMs for downstream tasks. To address this dilemma, we propose a paradigm called EnPrompt with a novel External Layer (EnLa). Specifically, we propose a textual external layer and learnable visual embeddings for adapting VLMs to downstream tasks. The learnable external layer is built upon valid embeddings of pre-trained CLIP. This design considers the balance of learning capabilities between the two branches. To align the textual and visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training · ALIGN