Visual Instruction Inversion: Image Editing via Visual Prompting
Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee

TL;DR
This paper introduces Visual Instruction Inversion, a method that converts visual prompts into editing instructions for image editing, leveraging pretrained diffusion models to perform edits based on example pairs.
Contribution
It presents a novel approach to image editing that uses visual prompts to invert into editing instructions, enabling effective edits with minimal examples.
Findings
Achieves competitive results with just one example pair.
Outperforms some existing text-conditioned editing methods.
Utilizes pretrained diffusion models for visual prompt inversion.
Abstract
Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challenges, visual prompts can be a more informative and intuitive way to convey ideas. We present a method for image editing via visual prompting. Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images. We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions. Our results show that with just one example pair, we can achieve competitive results compared to state-of-the-art text-conditioned image editing frameworks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques
MethodsDiffusion
