Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing

Jingxuan He; Xiyu Wang; Mengyu Zheng; Xiangyu Zeng; Yunke Wang; Chang Xu

arXiv:2604.20258·cs.CV·April 23, 2026

Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing

Jingxuan He, Xiyu Wang, Mengyu Zheng, Xiangyu Zeng, Yunke Wang, Chang Xu

PDF

TL;DR

This paper introduces a task-aware localization framework for instruction-based image editing that improves editing precision by explicitly identifying edit regions using attention cues, enhancing consistency without sacrificing performance.

Contribution

It proposes a training-free, task-aware localization method leveraging source and target image streams, addressing over-editing issues in current diffusion transformer-based models.

Findings

01

Improves non-edit region consistency in image editing.

02

Maintains strong instruction-following performance.

03

Enhances editing accuracy on EdiVal-Bench.

Abstract

Instruction-based image editing (IIE) aims to modify images according to textual instructions while preserving irrelevant content. Despite recent advances in diffusion transformers, existing methods often suffer from over-editing, introducing unintended changes to regions unrelated to the desired edit. We identify that this limitation arises from the lack of an explicit mechanism for edit localization. In particular, different editing operations (e.g., addition, removal and replacement) induce distinct spatial patterns, yet current IIE models typically treat localization in a task-agnostic manner. To address this limitation, we propose a training-free, task-aware edit localization framework that exploits the intrinsic source and target image streams within IIE models. For each image stream, We first obtain attention-based edit cues, and then construct feature centroids based on these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.