GIDE: Unlocking Diffusion LLMs for Precise Training-Free Image Editing

Zifeng Zhu; Jiaming Han; Jiaxiang Zhao; Minnan Luo; Xiangyu Yue

arXiv:2603.21176·cs.CV·March 24, 2026

GIDE: Unlocking Diffusion LLMs for Precise Training-Free Image Editing

Zifeng Zhu, Jiaming Han, Jiaxiang Zhao, Minnan Luo, Xiangyu Yue

PDF

Open Access

TL;DR

GIDE introduces a novel, training-free image editing framework for diffusion large language models, achieving high-fidelity, precise edits while preserving unedited regions, supported by a new benchmark and extensive experimental validation.

Contribution

GIDE presents a Discrete Noise Inversion technique and a multi-stage editing pipeline, enabling accurate, versatile, and training-free image editing with a new comprehensive benchmark.

Findings

01

Outperforms prior methods with 51.83% improvement in semantic correctness.

02

Achieves 50.39% higher perceptual quality.

03

Demonstrates broad applicability and photorealistic results.

Abstract

While Diffusion Large Language Models (DLLMs) have demonstrated remarkable capabilities in multi-modal generation, performing precise, training-free image editing remains an open challenge. Unlike continuous diffusion models, the discrete tokenization inherent in DLLMs hinders the application of standard noise inversion techniques, often leading to structural degradation during editing. In this paper, we introduce GIDE (Grounded Inversion for DLLM Image Editing), a unified framework designed to bridge this gap. GIDE incorporates a novel Discrete Noise Inversion mechanism that accurately captures latent noise patterns within the discrete token space, ensuring high-fidelity reconstruction. We then decompose the editing pipeline into grounding, inversion, and refinement stages. This design enables GIDE supporting various editing instructions (text, point and box) and operations while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques