ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning

Yiran Zhao; Yaoqi Ye; Xiang Liu; Michael Qizhe Shieh; Trung Bui

arXiv:2603.08059·cs.CV·March 10, 2026

ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning

Yiran Zhao, Yaoqi Ye, Xiang Liu, Michael Qizhe Shieh, Trung Bui

PDF

Open Access

TL;DR

ImageEdit-R1 introduces a multi-agent reinforcement learning framework for complex, context-aware image editing, significantly improving performance over existing models by coordinating specialized agents for nuanced edits.

Contribution

This work presents a novel multi-agent reinforcement learning approach that enables dynamic, goal-oriented image editing through coordinated decision-making among specialized agents.

Findings

01

Outperforms existing closed-source diffusion models

02

Achieves better results on multiple image editing datasets

03

Demonstrates effective multi-agent collaboration in editing tasks

Abstract

With the rapid advancement of commercial multi-modal models, image editing has garnered significant attention due to its widespread applicability in daily life. Despite impressive progress, existing image editing systems, particularly closed-source or proprietary models, often struggle with complex, indirect, or multi-step user instructions. These limitations hinder their ability to perform nuanced, context-aware edits that align with human intent. In this work, we propose ImageEdit-R1, a multi-agent framework for intelligent image editing that leverages reinforcement learning to coordinate high-level decision-making across a set of specialized, pretrained vision-language and generative agents. Each agent is responsible for distinct capabilities--such as understanding user intent, identifying regions of interest, selecting appropriate editing actions, and synthesizing visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Cell Image Analysis Techniques