Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing
Jingxuan He, Xiyu Wang, Mengyu Zheng, Xiangyu Zeng, Yunke Wang, Chang Xu

TL;DR
This paper introduces a task-aware localization framework for instruction-based image editing that improves editing precision by explicitly identifying edit regions using attention cues, enhancing consistency without sacrificing performance.
Contribution
It proposes a training-free, task-aware localization method leveraging source and target image streams, addressing over-editing issues in current diffusion transformer-based models.
Findings
Improves non-edit region consistency in image editing.
Maintains strong instruction-following performance.
Enhances editing accuracy on EdiVal-Bench.
Abstract
Instruction-based image editing (IIE) aims to modify images according to textual instructions while preserving irrelevant content. Despite recent advances in diffusion transformers, existing methods often suffer from over-editing, introducing unintended changes to regions unrelated to the desired edit. We identify that this limitation arises from the lack of an explicit mechanism for edit localization. In particular, different editing operations (e.g., addition, removal and replacement) induce distinct spatial patterns, yet current IIE models typically treat localization in a task-agnostic manner. To address this limitation, we propose a training-free, task-aware edit localization framework that exploits the intrinsic source and target image streams within IIE models. For each image stream, We first obtain attention-based edit cues, and then construct feature centroids based on these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
