Revisiting the Power of Prompt for Visual Tuning

Yuzhu Wang; Lechao Cheng; Chaowei Fang; Dingwen Zhang; Manni Duan,; Meng Wang

arXiv:2402.02382·cs.CV·May 28, 2024·1 cites

Revisiting the Power of Prompt for Visual Tuning

Yuzhu Wang, Lechao Cheng, Chaowei Fang, Dingwen Zhang, Manni Duan,, Meng Wang

PDF

Open Access

TL;DR

This paper introduces a novel prompt initialization and construction method for visual prompt tuning that significantly improves performance, especially in self-supervised pretraining, with minimal additional computational cost.

Contribution

The study proposes prompt initialization with downstream token prototypes and a streamlined token construction pipeline, enhancing VPT performance and robustness across various tasks.

Findings

01

Outperforms existing methods on 19 out of 24 tasks

02

Achieves 10-30% performance gains in self-supervised pretraining

03

Requires less than 0.4% learnable parameters

Abstract

Visual prompt tuning (VPT) is a promising solution incorporating learnable prompt tokens to customize pre-trained models for downstream tasks. However, VPT and its variants often encounter challenges like prompt initialization, prompt length, and subpar performance in self-supervised pretraining, hindering successful contextual adaptation. This study commences by exploring the correlation evolvement between prompts and patch tokens during proficient training. Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes. The strategic initialization, a stand-in for the previous initialization, substantially improves performance in fine-tuning. To refine further, we optimize token construction with a streamlined pipeline that maintains excellent performance with almost no…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications