Exploring Visual Prompts for Adapting Large-Scale Models
Hyojin Bahng, Ali Jahanian, Swami Sankaranarayanan, Phillip Isola

TL;DR
This paper explores visual prompting as a method to adapt large-scale vision models like CLIP, using a single image perturbation to enable new tasks, showing competitive performance and robustness to distribution shifts.
Contribution
It introduces visual prompting for vision models, demonstrating its effectiveness and robustness, offering a new approach for model adaptation in computer vision.
Findings
Visual prompting is effective for CLIP and other models.
It is robust to distribution shifts.
Performance is competitive with linear probes.
Abstract
We investigate the efficacy of visual prompting to adapt large-scale models in vision. Following the recent approach from prompt tuning and adversarial reprogramming, we learn a single image perturbation such that a frozen model prompted with this perturbation performs a new task. Through comprehensive experiments, we demonstrate that visual prompting is particularly effective for CLIP and robust to distribution shift, achieving performance competitive with standard linear probes. We further analyze properties of the downstream dataset, prompt design, and output transformation in regard to adaptation performance. The surprising effectiveness of visual prompting provides a new perspective on adapting pre-trained models in vision. Code is available at http://hjbahng.github.io/visual_prompting .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Cell Image Analysis Techniques · Advanced Vision and Imaging
MethodsContrastive Language-Image Pre-training
