Rethinking Visual Prompt Learning as Masked Visual Token Modeling

Ning Liao; Bowen Shi; Xiaopeng Zhang; Min Cao; Junchi Yan; Qi Tian

arXiv:2303.04998·cs.CV·December 18, 2023·1 cites

Rethinking Visual Prompt Learning as Masked Visual Token Modeling

Ning Liao, Bowen Shi, Xiaopeng Zhang, Min Cao, Junchi Yan, Qi Tian

PDF

Open Access

TL;DR

This paper introduces VPTM, a novel visual prompt learning method that reformulates visual classification as masked token prediction on generative pre-trained models, enhancing performance and robustness.

Contribution

It is the first to adapt prompt learning to generative pre-trained visual models, unifying pre-training and downstream tasks through masked token modeling.

Findings

01

VPTM outperforms existing visual prompt methods.

02

VPTM demonstrates robustness to prompt variations.

03

VPTM achieves high efficiency in visual classification.

Abstract

Prompt learning has achieved great success in efficiently exploiting large-scale pre-trained models in natural language processing (NLP). It reformulates the downstream tasks as the generative pre-training ones to achieve consistency, thus improving the performance stably. However, when transferring it to the vision area, current visual prompt learning methods are almost designed on discriminative pre-trained models, and there is also a lack of careful design to unify the forms of pre-training and downstream tasks. To explore prompt learning on the generative pre-trained visual model, as well as keeping the task consistency, we propose Visual Prompt learning as masked visual Token Modeling (VPTM) to transform the downstream visual classification into the pre-trained masked visual token prediction. In addition, we develop the prototypical verbalizer for mapping the predicted visual token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques