IMAGAgent: Orchestrating Multi-Turn Image Editing via Constraint-Aware Planning and Reflection

Fei Shen; Chengyu Xie; Lihong Wang; Zhanyi Zhang; Xin Jiang; Xiaoyu Du; and Jinhui Tang

arXiv:2603.29602·cs.GR·April 1, 2026

IMAGAgent: Orchestrating Multi-Turn Image Editing via Constraint-Aware Planning and Reflection

Fei Shen, Chengyu Xie, Lihong Wang, Zhanyi Zhang, Xin Jiang, Xiaoyu Du, and Jinhui Tang

PDF

1 Repo

TL;DR

IMAGAgent introduces a multi-turn image editing framework that uses a closed-loop 'plan-execute-reflect' mechanism, improving accuracy and reducing errors in complex, multi-step image editing tasks.

Contribution

It presents a novel constraint-aware planning and reflection system that enhances multi-turn image editing through adaptive scheduling and feedback integration.

Findings

01

Outperforms existing methods in instruction consistency and editing precision.

02

Demonstrates significant improvements on MTEditBench and MagicBrush datasets.

03

Achieves higher overall image quality in multi-turn editing tasks.

Abstract

Existing multi-turn image editing paradigms are often confined to isolated single-step execution. Due to a lack of context-awareness and closed-loop feedback mechanisms, they are prone to error accumulation and semantic drift during multi-turn interactions, ultimately resulting in severe structural distortion of the generated images. For that, we propose \textbf{IMAGAgent}, a multi-turn image editing agent framework based on a "plan-execute-reflect" closed-loop mechanism that achieves deep synergy among instruction parsing, tool scheduling, and adaptive correction within a unified pipeline. Specifically, we first present a constraint-aware planning module that leverages a vision-language model (VLM) to precisely decompose complex natural language instructions into a series of executable sub-tasks, governed by target singularity, semantic atomicity, and visual perceptibility. Then, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hackermmzz/IMAGAgent.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.