Watch Your Steps: Local Image and Scene Editing by Text Instructions
Ashkan Mirzaei, Tristan Aumentado-Armstrong, Marcus A. Brubaker,, Jonathan Kelly, Alex Levinshtein, Konstantinos G. Derpanis, Igor, Gilitschenski

TL;DR
This paper introduces a novel method for localizing and guiding image and scene edits based on text instructions using relevance maps derived from diffusion models, improving precision and quality in 2D and 3D editing tasks.
Contribution
It proposes a new relevance map technique to localize edits in images and 3D scenes guided by text, enhancing editing accuracy and quality over previous methods.
Findings
Achieves state-of-the-art results in image editing tasks.
Enhances 3D scene editing with relevance-guided neural radiance fields.
Effectively localizes edits using relevance maps derived from instruction discrepancies.
Abstract
Denoising diffusion models have enabled high-quality image generation and editing. We present a method to localize the desired edit region implicit in a text instruction. We leverage InstructPix2Pix (IP2P) and identify the discrepancy between IP2P predictions with and without the instruction. This discrepancy is referred to as the relevance map. The relevance map conveys the importance of changing each pixel to achieve the edits, and is used to to guide the modifications. This guidance ensures that the irrelevant pixels remain unchanged. Relevance maps are further used to enhance the quality of text-guided editing of 3D scenes in the form of neural radiance fields. A field is trained on relevance maps of training views, denoted as the relevance field, defining the 3D region within which modifications should be made. We perform iterative updates on the training views guided by rendered…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications
MethodsDiffusion
