TL;DR
This paper introduces Reason50K, a large dataset for hypothetical instruction reasoning in image editing, and ReasonBrain, a framework that combines multimodal models and fine-grained reasoning for complex, implicit editing instructions.
Contribution
It presents a new dataset and a novel reasoning-based image editing framework that effectively handles complex, implicit instructions across diverse scenarios.
Findings
ReasonBrain outperforms state-of-the-art methods on reasoning scenarios.
The framework demonstrates strong zero-shot generalization to traditional image editing tasks.
Reason50K enables training and evaluation of reasoning-aware image editing models.
Abstract
Instruction-based image editing (IIE) has advanced rapidly with the success of diffusion models. However, existing efforts primarily focus on simple and explicit instructions to execute editing operations such as adding, deleting, moving, or swapping objects. They struggle to handle more complex implicit hypothetical instructions that require deeper reasoning to infer plausible visual changes and user intent. Additionally, current datasets provide limited support for training and evaluating reasoning-aware editing capabilities. Architecturally, these methods also lack mechanisms for fine-grained detail extraction that support such reasoning. To address these limitations, we propose Reason50K, a large-scale dataset specifically curated for training and evaluating hypothetical instruction reasoning image editing, along with ReasonBrain, a novel framework designed to reason over and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
