Watch Your Steps: Local Image and Scene Editing by Text Instructions

Ashkan Mirzaei; Tristan Aumentado-Armstrong; Marcus A. Brubaker,; Jonathan Kelly; Alex Levinshtein; Konstantinos G. Derpanis; Igor; Gilitschenski

arXiv:2308.08947·cs.CV·July 4, 2024·2 cites

Watch Your Steps: Local Image and Scene Editing by Text Instructions

Ashkan Mirzaei, Tristan Aumentado-Armstrong, Marcus A. Brubaker,, Jonathan Kelly, Alex Levinshtein, Konstantinos G. Derpanis, Igor, Gilitschenski

PDF

Open Access

TL;DR

This paper introduces a novel method for localizing and guiding image and scene edits based on text instructions using relevance maps derived from diffusion models, improving precision and quality in 2D and 3D editing tasks.

Contribution

It proposes a new relevance map technique to localize edits in images and 3D scenes guided by text, enhancing editing accuracy and quality over previous methods.

Findings

01

Achieves state-of-the-art results in image editing tasks.

02

Enhances 3D scene editing with relevance-guided neural radiance fields.

03

Effectively localizes edits using relevance maps derived from instruction discrepancies.

Abstract

Denoising diffusion models have enabled high-quality image generation and editing. We present a method to localize the desired edit region implicit in a text instruction. We leverage InstructPix2Pix (IP2P) and identify the discrepancy between IP2P predictions with and without the instruction. This discrepancy is referred to as the relevance map. The relevance map conveys the importance of changing each pixel to achieve the edits, and is used to to guide the modifications. This guidance ensures that the irrelevant pixels remain unchanged. Relevance maps are further used to enhance the quality of text-guided editing of 3D scenes in the form of neural radiance fields. A field is trained on relevance maps of training views, denoted as the relevance field, defining the 3D region within which modifications should be made. We perform iterative updates on the training views guided by rendered…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications

MethodsDiffusion