Image Editing As Programs with Diffusion Models
Yujia Hu, Songhua Liu, Zhenxiong Tan, Xingyi Yang, Xinchao Wang

TL;DR
This paper introduces IEAP, a novel framework using diffusion transformers and programmatic decomposition to improve instruction-driven image editing, especially for complex and structural changes.
Contribution
The paper presents a unified, modular image editing framework that decomposes complex instructions into atomic operations, enhancing robustness and generalization in diffusion-based editing models.
Findings
IEAP outperforms state-of-the-art methods on standard benchmarks.
The modular approach improves handling of complex, multi-step edits.
IEAP achieves higher accuracy and semantic fidelity in image editing.
Abstract
While diffusion models have achieved remarkable success in text-to-image generation, they encounter significant challenges with instruction-driven image editing. Our research highlights a key challenge: these models particularly struggle with structurally inconsistent edits that involve substantial layout changes. To mitigate this gap, we introduce Image Editing As Programs (IEAP), a unified image editing framework built upon the Diffusion Transformer (DiT) architecture. At its core, IEAP approaches instructional editing through a reductionist lens, decomposing complex editing instructions into sequences of atomic operations. Each operation is implemented via a lightweight adapter sharing the same DiT backbone and is specialized for a specific type of edit. Programmed by a vision-language model (VLM)-based agent, these operations collaboratively support arbitrary and structurally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Humanities and Scholarship · Cell Image Analysis Techniques
