Making Image Editing Easier via Adaptive Task Reformulation with Agentic Executions
Bo Zhao, Kairui Guo, Runnan Du, Haiyang Sun, Pengshan Wang, Huan Yang, Kun Gai, Yixin Cao, Wei Ji

TL;DR
This paper introduces an adaptive task reformulation framework that enhances instruction-guided image editing by dynamically transforming tasks, leading to improved performance especially on challenging cases without changing the underlying models.
Contribution
It proposes a novel framework that reformulates image editing tasks using a multi-modal language model agent, significantly improving results across multiple benchmarks and editing backbones.
Findings
Consistent performance improvements on ImgEdit, PICA, and RePlan benchmarks.
Large gains observed on challenging editing cases.
Task reformulation is a key factor in improving image editing outcomes.
Abstract
Instruction guided image editing has advanced substantially with recent generative models, yet it still fails to produce reliable results across many seemingly simple cases. We observe that a large portion of these failures stem not from insufficient model capacity, but from poorly formulated editing tasks, such as those involving small targets, implicit spatial relations, or under-specified instructions. In this work, we frame image editing failures as a task formulation problem and propose an adaptive task reformulation framework that improves editing performance without modifying the underlying model. Our key idea is to transform the original image-instruction pair into a sequence of operations that are dynamically determined and executed by a MLLM agent through analysis, routing, reformulation, and feedback-driven refinement. Experiments on multiple benchmarks, including ImgEdit,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
