Making Image Editing Easier via Adaptive Task Reformulation with Agentic Executions

Bo Zhao; Kairui Guo; Runnan Du; Haiyang Sun; Pengshan Wang; Huan Yang; Kun Gai; Yixin Cao; Wei Ji

arXiv:2604.15917·cs.CV·April 20, 2026

Making Image Editing Easier via Adaptive Task Reformulation with Agentic Executions

Bo Zhao, Kairui Guo, Runnan Du, Haiyang Sun, Pengshan Wang, Huan Yang, Kun Gai, Yixin Cao, Wei Ji

PDF

TL;DR

This paper introduces an adaptive task reformulation framework that enhances instruction-guided image editing by dynamically transforming tasks, leading to improved performance especially on challenging cases without changing the underlying models.

Contribution

It proposes a novel framework that reformulates image editing tasks using a multi-modal language model agent, significantly improving results across multiple benchmarks and editing backbones.

Findings

01

Consistent performance improvements on ImgEdit, PICA, and RePlan benchmarks.

02

Large gains observed on challenging editing cases.

03

Task reformulation is a key factor in improving image editing outcomes.

Abstract

Instruction guided image editing has advanced substantially with recent generative models, yet it still fails to produce reliable results across many seemingly simple cases. We observe that a large portion of these failures stem not from insufficient model capacity, but from poorly formulated editing tasks, such as those involving small targets, implicit spatial relations, or under-specified instructions. In this work, we frame image editing failures as a task formulation problem and propose an adaptive task reformulation framework that improves editing performance without modifying the underlying model. Our key idea is to transform the original image-instruction pair into a sequence of operations that are dynamically determined and executed by a MLLM agent through analysis, routing, reformulation, and feedback-driven refinement. Experiments on multiple benchmarks, including ImgEdit,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.