Attention to the Burstiness in Visual Prompt Tuning!

Yuzhu Wang; Manni Duan; Shu Kong

arXiv:2506.22908·cs.CV·August 19, 2025

Attention to the Burstiness in Visual Prompt Tuning!

Yuzhu Wang, Manni Duan, Shu Kong

PDF

Open Access 1 Repo

TL;DR

This paper introduces Bilinear Prompt Tuning (BPT), a novel method that uses whitening and low-rank bilinear models to improve prompt tuning in vision Transformers, significantly boosting accuracy and efficiency.

Contribution

It proposes a whitening-based approach and a low-rank bilinear model for prompt tuning, addressing burstiness and distribution challenges in VPT, leading to faster and more accurate training.

Findings

01

BPT significantly improves accuracy, e.g., +25 points on CUB dataset.

02

BPT outperforms existing VPT methods across multiple benchmarks.

03

BPT reduces parameter count and computational overhead.

Abstract

Visual Prompt Tuning (VPT) is a parameter-efficient fune-tuning technique that adapts a pre-trained vision Transformer (ViT) by learning a small set of parameters in the input space, known as prompts. In VPT, we uncover ``burstiness'' in the values arising from the interaction of image patch embeddings, and the key and query projectors within Transformer's self-attention module. Furthermore, the values of patch embeddings and the key and query projectors exhibit Laplacian and hyper-Laplacian distribution, respectively. Intuitively, these non-Gaussian distributions pose challenges for learning prompts. To address this, we propose whitening these data, de-correlating them and equalizing their variance towards more Gaussian before learning prompts. We derive the whitening matrix over random image patch embeddings and ViT's key and query projectors, and multiply it with the prompt to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WangYZ1608/BPT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Education Research · Design Education and Practice

MethodsDropout · Absolute Position Encodings · Byte Pair Encoding · Softmax · Label Smoothing · Transformer · Sparse Evolutionary Training · Dense Connections · Layer Normalization · Vision Transformer