Token Coordinated Prompt Attention is Needed for Visual Prompting
Zichen Liu, Xu Zou, Gang Hua, Jiahuan Zhou

TL;DR
This paper introduces Token Coordinated Prompt Attention (TCPA), a novel module that assigns specific prompts to different tokens in Vision Transformers, significantly improving their discriminative power and performance across benchmarks.
Contribution
The paper proposes a new TCPA module that assigns coordinated prompts to tokens, enhancing the discriminative ability and diversity of features in Vision Transformers.
Findings
TCPA significantly improves feature diversity and discriminative power.
Extensive experiments show TCPA outperforms existing methods across benchmarks.
The method effectively disentangles prompts for CLS and image tokens, enhancing attention interactions.
Abstract
Visual prompting techniques are widely used to efficiently fine-tune pretrained Vision Transformers (ViT) by learning a small set of shared prompts for all tokens. However, existing methods overlook the unique roles of different tokens in conveying discriminative information and interact with all tokens using the same prompts, thereby limiting the representational capacity of ViT. This often leads to indistinguishable and biased prompt-extracted features, hindering performance. To address this issue, we propose a plug-and-play Token Coordinated Prompt Attention (TCPA) module, which assigns specific coordinated prompts to different tokens for attention-based interactions. Firstly, recognizing the distinct functions of CLS and image tokens-global information aggregation and local feature extraction, we disentangle the prompts into CLS Prompts and Image Prompts, which interact exclusively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Currency Recognition and Detection
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training
