MIRA: Multimodal Iterative Reasoning Agent for Image Editing

Ziyun Zeng; Hang Hua; Jiebo Luo

arXiv:2511.21087·cs.CV·February 26, 2026

MIRA: Multimodal Iterative Reasoning Agent for Image Editing

Ziyun Zeng, Hang Hua, Jiebo Luo

PDF

Open Access

TL;DR

MIRA is a multimodal reasoning agent that iteratively interprets and executes complex image editing instructions, improving accuracy and quality through visual feedback and a specialized dataset.

Contribution

Introduces MIRA, a novel iterative reasoning framework for image editing that leverages multimodal feedback and a new dataset for enhanced performance.

Findings

01

MIRA outperforms existing models in semantic consistency.

02

MIRA achieves comparable or better results than proprietary systems.

03

The approach effectively handles complex, compositional editing instructions.

Abstract

Instruction-guided image editing offers an intuitive way for users to edit images with natural language. However, diffusion-based editing models often struggle to accurately interpret complex user instructions, especially those involving compositional relationships, contextual cues, or referring expressions, leading to edits that drift semantically or fail to reflect the intended changes. We tackle this problem by proposing MIRA (Multimodal Iterative Reasoning Agent), a lightweight, plug-and-play multimodal reasoning agent that performs editing through an iterative perception-reasoning-action loop, effectively simulating multi-turn human-model interaction processes. Instead of issuing a single prompt or static plan, MIRA predicts atomic edit instructions step by step, using visual feedback to make its decisions. Our 150K multimodal tool-use dataset, MIRA-Editing, combined with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Digital Humanities and Scholarship