SEP: Self-Enhanced Prompt Tuning for Visual-Language Model
Hantao Yao, Rui Zhang, Lu Yu, Yongdong Zhang, Changsheng Xu

TL;DR
SEP enhances prompt tuning for visual-language models by adaptively incorporating discriminative pre-trained tokens at each encoder layer, improving task performance and domain generalization.
Contribution
Introduces Self-Enhanced Prompt Tuning (SEP), a novel method that adaptively merges pre-trained tokens with learnable prompts to improve discrimination and generalization in VLMs.
Findings
SEP outperforms existing prompt tuning methods on multiple benchmarks.
Self-enhanced tokens improve domain adaptation and robustness.
The approach effectively captures input-specific knowledge for better embeddings.
Abstract
Prompt tuning based on Context Optimization (CoOp) effectively adapts visual-language models (VLMs) to downstream tasks by inferring additional learnable prompt tokens. However, these tokens are less discriminative as they are independent of the pre-trained tokens and fail to capture input-specific knowledge, such as class-aware textual or instance-aware visual knowledge. Leveraging the discriminative and generalization capabilities inherent in pre-trained tokens, we introduce a novel approach named Self-Enhanced Prompt Tuning (SEP). The core principle of SEP involves adapting the learnable prompt tokens at each encoder layer from the corresponding self-pretrained tokens, thereby explicitly incorporating discriminative prior knowledge to enhance both textual-level and visual-level embeddings. Furthermore, SEP's self-enhanced tokens not only boost discrimination but also mitigate domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Topic Modeling
