Visual Prompting via Image Inpainting
Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A., Efros

TL;DR
This paper introduces a novel visual prompting method using image inpainting with pre-trained models, enabling adaptation to new tasks without fine-tuning, demonstrated on diverse image-to-image applications.
Contribution
It proposes framing visual prompting as image inpainting with pre-trained masked auto-encoders trained on a large dataset, showing effectiveness across multiple tasks.
Findings
Effective task adaptation without fine-tuning.
Successful application to various image-to-image tasks.
Pre-trained inpainting models generalize well to downstream tasks.
Abstract
How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing this problem as simple image inpainting - literally just filling in a hole in a concatenated visual prompt image - turns out to be surprisingly effective, provided that the inpainting algorithm has been trained on the right data. We train masked auto-encoders on a new dataset that we curated - 88k unlabeled figures from academic papers sources on Arxiv. We apply visual prompting to these pretrained models and demonstrate results on various downstream image-to-image tasks, including foreground…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques
MethodsTest · Inpainting
