Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning

Qingdong He; Xueqin Chen; Chaoyi Wang; Yanjie Pan; Xiaobin Hu; Zhenye Gan; Yabiao Wang; Chengjie Wang; Xiangtai Li; Jiangning Zhang

arXiv:2507.01908·cs.CV·May 14, 2026

Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning

Qingdong He, Xueqin Chen, Chaoyi Wang, Yanjie Pan, Xiaobin Hu, Zhenye Gan, Yabiao Wang, Chengjie Wang, Xiangtai Li, Jiangning Zhang

PDF

1 Repo

TL;DR

This paper introduces Reason50K, a large dataset for hypothetical instruction reasoning in image editing, and ReasonBrain, a framework that combines multimodal models and fine-grained reasoning for complex, implicit editing instructions.

Contribution

It presents a new dataset and a novel reasoning-based image editing framework that effectively handles complex, implicit instructions across diverse scenarios.

Findings

01

ReasonBrain outperforms state-of-the-art methods on reasoning scenarios.

02

The framework demonstrates strong zero-shot generalization to traditional image editing tasks.

03

Reason50K enables training and evaluation of reasoning-aware image editing models.

Abstract

Instruction-based image editing (IIE) has advanced rapidly with the success of diffusion models. However, existing efforts primarily focus on simple and explicit instructions to execute editing operations such as adding, deleting, moving, or swapping objects. They struggle to handle more complex implicit hypothetical instructions that require deeper reasoning to infer plausible visual changes and user intent. Additionally, current datasets provide limited support for training and evaluating reasoning-aware editing capabilities. Architecturally, these methods also lack mechanisms for fine-grained detail extraction that support such reasoning. To address these limitations, we propose Reason50K, a large-scale dataset specifically curated for training and evaluating hypothetical instruction reasoning image editing, along with ReasonBrain, a novel framework designed to reason over and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.