TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model
Hantao Yao, Rui Zhang, Changsheng Xu

TL;DR
This paper introduces TCP, a prompt tuning method that incorporates class-aware textual knowledge to improve the generalization of visual-language models to unseen domains, achieving superior performance with less training.
Contribution
The paper proposes a novel Textual Knowledge Embedding (TKE) module and TCP framework that dynamically generate class-aware prompts, enhancing domain generalization in prompt tuning for VLMs.
Findings
TCP outperforms existing methods on various benchmarks.
TKE is a versatile plug-and-play module.
TCP requires less training time than comparable approaches.
Abstract
Prompt tuning represents a valuable technique for adapting pre-trained visual-language models (VLM) to various downstream tasks. Recent advancements in CoOp-based methods propose a set of learnable domain-shared or image-conditional textual tokens to facilitate the generation of task-specific textual classifiers. However, those textual tokens have a limited generalization ability regarding unseen domains, as they cannot dynamically adjust to the distribution of testing classes. To tackle this issue, we present a novel Textual-based Class-aware Prompt tuning(TCP) that explicitly incorporates prior knowledge about classes to enhance their discriminability. The critical concept of TCP involves leveraging Textual Knowledge Embedding (TKE) to map the high generalizability of class-level textual knowledge into class-aware textual tokens. By seamlessly integrating these class-aware prompts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Subtitles and Audiovisual Media
MethodsSparse Evolutionary Training
