Token Coordinated Prompt Attention is Needed for Visual Prompting

Zichen Liu; Xu Zou; Gang Hua; Jiahuan Zhou

arXiv:2505.02406·cs.CV·May 8, 2025

Token Coordinated Prompt Attention is Needed for Visual Prompting

Zichen Liu, Xu Zou, Gang Hua, Jiahuan Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces Token Coordinated Prompt Attention (TCPA), a novel module that assigns specific prompts to different tokens in Vision Transformers, significantly improving their discriminative power and performance across benchmarks.

Contribution

The paper proposes a new TCPA module that assigns coordinated prompts to tokens, enhancing the discriminative ability and diversity of features in Vision Transformers.

Findings

01

TCPA significantly improves feature diversity and discriminative power.

02

Extensive experiments show TCPA outperforms existing methods across benchmarks.

03

The method effectively disentangles prompts for CLS and image tokens, enhancing attention interactions.

Abstract

Visual prompting techniques are widely used to efficiently fine-tune pretrained Vision Transformers (ViT) by learning a small set of shared prompts for all tokens. However, existing methods overlook the unique roles of different tokens in conveying discriminative information and interact with all tokens using the same prompts, thereby limiting the representational capacity of ViT. This often leads to indistinguishable and biased prompt-extracted features, hindering performance. To address this issue, we propose a plug-and-play Token Coordinated Prompt Attention (TCPA) module, which assigns specific coordinated prompts to different tokens for attention-based interactions. Firstly, recognizing the distinct functions of CLS and image tokens-global information aggregation and local feature extraction, we disentangle the prompts into CLS Prompts and Image Prompts, which interact exclusively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhoujiahuan1991/icml2025-tcpa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Currency Recognition and Detection

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training