LocInv: Localization-aware Inversion for Text-Guided Image Editing
Chuanming Tang, Kai Wang, Fei Yang, Joost van de Weijer

TL;DR
LocInv introduces a localization-aware inversion method that refines cross-attention maps in diffusion models, enabling precise, object-specific image editing guided by text prompts and localization priors.
Contribution
The paper presents a novel localization-aware inversion technique that improves text-guided image editing accuracy by aligning cross-attention maps with localization priors, reducing unintended modifications.
Findings
Achieves fine-grained object editing with minimal unintended changes.
Outperforms existing methods both quantitatively and qualitatively on COCO dataset.
Demonstrates effectiveness of localization priors in diffusion-based image editing.
Abstract
Large-scale Text-to-Image (T2I) diffusion models demonstrate significant generation capabilities based on textual prompts. Based on the T2I diffusion models, text-guided image editing research aims to empower users to manipulate generated images by altering the text prompts. However, existing image editing techniques are prone to editing over unintentional regions that are beyond the intended target area, primarily due to inaccuracies in cross-attention maps. To address this problem, we propose Localization-aware Inversion (LocInv), which exploits segmentation maps or bounding boxes as extra localization priors to refine the cross-attention maps in the denoising phases of the diffusion process. Through the dynamic updating of tokens corresponding to noun words in the textual input, we are compelling the cross-attention maps to closely align with the correct noun and adjective words in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Medical Image Segmentation Techniques
MethodsDiffusion · ALIGN
