SEP: Self-Enhanced Prompt Tuning for Visual-Language Model

Hantao Yao; Rui Zhang; Lu Yu; Yongdong Zhang; Changsheng Xu

arXiv:2405.15549·cs.CV·November 25, 2024·1 cites

SEP: Self-Enhanced Prompt Tuning for Visual-Language Model

Hantao Yao, Rui Zhang, Lu Yu, Yongdong Zhang, Changsheng Xu

PDF

Open Access 1 Repo

TL;DR

SEP enhances prompt tuning for visual-language models by adaptively incorporating discriminative pre-trained tokens at each encoder layer, improving task performance and domain generalization.

Contribution

Introduces Self-Enhanced Prompt Tuning (SEP), a novel method that adaptively merges pre-trained tokens with learnable prompts to improve discrimination and generalization in VLMs.

Findings

01

SEP outperforms existing prompt tuning methods on multiple benchmarks.

02

Self-enhanced tokens improve domain adaptation and robustness.

03

The approach effectively captures input-specific knowledge for better embeddings.

Abstract

Prompt tuning based on Context Optimization (CoOp) effectively adapts visual-language models (VLMs) to downstream tasks by inferring additional learnable prompt tokens. However, these tokens are less discriminative as they are independent of the pre-trained tokens and fail to capture input-specific knowledge, such as class-aware textual or instance-aware visual knowledge. Leveraging the discriminative and generalization capabilities inherent in pre-trained tokens, we introduce a novel approach named Self-Enhanced Prompt Tuning (SEP). The core principle of SEP involves adapting the learnable prompt tokens at each encoder layer from the corresponding self-pretrained tokens, thereby explicitly incorporating discriminative prior knowledge to enhance both textual-level and visual-level embeddings. Furthermore, SEP's self-enhanced tokens not only boost discrimination but also mitigate domain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

htyao89/sep
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Topic Modeling