GIDE: Unlocking Diffusion LLMs for Precise Training-Free Image Editing
Zifeng Zhu, Jiaming Han, Jiaxiang Zhao, Minnan Luo, Xiangyu Yue

TL;DR
GIDE introduces a novel, training-free image editing framework for diffusion large language models, achieving high-fidelity, precise edits while preserving unedited regions, supported by a new benchmark and extensive experimental validation.
Contribution
GIDE presents a Discrete Noise Inversion technique and a multi-stage editing pipeline, enabling accurate, versatile, and training-free image editing with a new comprehensive benchmark.
Findings
Outperforms prior methods with 51.83% improvement in semantic correctness.
Achieves 50.39% higher perceptual quality.
Demonstrates broad applicability and photorealistic results.
Abstract
While Diffusion Large Language Models (DLLMs) have demonstrated remarkable capabilities in multi-modal generation, performing precise, training-free image editing remains an open challenge. Unlike continuous diffusion models, the discrete tokenization inherent in DLLMs hinders the application of standard noise inversion techniques, often leading to structural degradation during editing. In this paper, we introduce GIDE (Grounded Inversion for DLLM Image Editing), a unified framework designed to bridge this gap. GIDE incorporates a novel Discrete Noise Inversion mechanism that accurately captures latent noise patterns within the discrete token space, ensuring high-fidelity reconstruction. We then decompose the editing pipeline into grounding, inversion, and refinement stages. This design enables GIDE supporting various editing instructions (text, point and box) and operations while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
