CVPT: Cross Visual Prompt Tuning

Lingyun Huang; Jianxu Mao; Junfei Yi; Ziming Tao; Yaonan Wang

arXiv:2408.14961·cs.CV·July 22, 2025

CVPT: Cross Visual Prompt Tuning

Lingyun Huang, Jianxu Mao, Junfei Yi, Ziming Tao, Yaonan Wang

PDF

Open Access 1 Repo

TL;DR

CVPT introduces a cross-attention mechanism in prompt tuning for vision models, effectively addressing VPT's limitations by preserving self-attention and improving performance across diverse datasets.

Contribution

This work proposes Cross Visual Prompt Tuning (CVPT), a novel prompt method with a cross-attention module that enhances feature interaction while maintaining self-attention integrity.

Findings

01

CVPT outperforms VPT on 25 datasets, including a 4%+ accuracy boost on VTAB-1K.

02

CVPT rivals leading adapter-based methods in performance and efficiency.

03

The code for CVPT is publicly available at the provided GitHub link.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) has emerged to mitigate the computational demands of large-scale models. Within computer vision, adapter-based PEFT methods are often favored over prompt-based approaches like Visual Prompt Tuning (VPT) due to the latter's performance and efficiency limitations. Our analysis reveals that VPT's shortcomings stem from its prompt deployment strategy, which can distort the model's inherent self-attention mechanism. To address this, we propose Cross Visual Prompt Tuning (CVPT). CVPT introduces a cross-attention module to directly model interactions between prompts and image tokens. This design decouples the prompts from the input sequence, preserving the original self-attention integrity while enabling efficient feature integration. Furthermore, we employ a weight-sharing mechanism for cross-attention initialization, which enhances representative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xlgsyzp/cvpt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces