Visual Prompt Tuning

Menglin Jia; Luming Tang; Bor-Chun Chen; Claire Cardie and; Serge Belongie; Bharath Hariharan; Ser-Nam Lim

arXiv:2203.12119·cs.CV·July 21, 2022

Visual Prompt Tuning

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie and, Serge Belongie, Bharath Hariharan, Ser-Nam Lim

PDF

5 Repos

TL;DR

Visual Prompt Tuning (VPT) offers a parameter-efficient method for adapting large vision models by tuning only a small input space component, often outperforming traditional full fine-tuning across various tasks.

Contribution

VPT introduces a novel approach that tunes less than 1% of parameters in vision transformers, maintaining a frozen backbone while achieving superior or comparable performance.

Findings

01

VPT outperforms other tuning methods on multiple vision tasks.

02

VPT can surpass full fine-tuning in many scenarios.

03

VPT reduces storage costs per task.

Abstract

The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Label Smoothing · Dropout