Understanding and Improving Visual Prompting: A Label-Mapping   Perspective

Aochuan Chen; Yuguang Yao; Pin-Yu Chen; Yihua Zhang; Sijia Liu

arXiv:2211.11635·cs.CV·March 28, 2023·5 cites

Understanding and Improving Visual Prompting: A Label-Mapping Perspective

Aochuan Chen, Yuguang Yao, Pin-Yu Chen, Yihua Zhang, Sijia Liu

PDF

Open Access 1 Repo

TL;DR

This paper investigates the role of label mapping in visual prompting for vision tasks, proposing a new iterative framework that enhances target task accuracy by optimizing label mappings, especially when combined with CLIP models.

Contribution

It introduces ILM-VP, an iterative label mapping framework that improves visual prompting accuracy and integrates label mapping with CLIP to enhance text prompt selection.

Findings

01

ILM-VP outperforms existing VP methods significantly.

02

Reprogramming ResNet-18 achieves up to 7.9% accuracy gain.

03

CLIP-based VP improves accuracy by up to 13.7%.

Abstract

We revisit and advance visual prompting (VP), an input prompting technique for vision tasks. VP can reprogram a fixed, pre-trained source model to accomplish downstream tasks in the target domain by simply incorporating universal prompts (in terms of input perturbation patterns) into downstream data points. Yet, it remains elusive why VP stays effective even given a ruleless label mapping (LM) between the source classes and the target classes. Inspired by the above, we ask: How is LM interrelated with VP? And how to exploit such a relationship to improve its accuracy on target tasks? We peer into the influence of LM on VP and provide an affirmative answer that a better 'quality' of LM (assessed by mapping precision and explanation) can consistently improve the effectiveness of VP. This is in contrast to the prior art where the factor of LM was missing. To optimize LM, we propose a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

optml-group/ilm-vp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training