Revisiting the Power of Prompt for Visual Tuning
Yuzhu Wang, Lechao Cheng, Chaowei Fang, Dingwen Zhang, Manni Duan,, Meng Wang

TL;DR
This paper introduces a novel prompt initialization and construction method for visual prompt tuning that significantly improves performance, especially in self-supervised pretraining, with minimal additional computational cost.
Contribution
The study proposes prompt initialization with downstream token prototypes and a streamlined token construction pipeline, enhancing VPT performance and robustness across various tasks.
Findings
Outperforms existing methods on 19 out of 24 tasks
Achieves 10-30% performance gains in self-supervised pretraining
Requires less than 0.4% learnable parameters
Abstract
Visual prompt tuning (VPT) is a promising solution incorporating learnable prompt tokens to customize pre-trained models for downstream tasks. However, VPT and its variants often encounter challenges like prompt initialization, prompt length, and subpar performance in self-supervised pretraining, hindering successful contextual adaptation. This study commences by exploring the correlation evolvement between prompts and patch tokens during proficient training. Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes. The strategic initialization, a stand-in for the previous initialization, substantially improves performance in fine-tuning. To refine further, we optimize token construction with a streamlined pipeline that maintains excellent performance with almost no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
