LADR: Locality-Aware Dynamic Rescue for Efficient Text-to-Image Generation with Diffusion Large Language Models
Chenglin Wang, Yucheng Zhou, Shawn Chen, Tao Wang, Kai Zhang

TL;DR
LADR is a training-free, spatially-aware method that accelerates diffusion-based text-to-image generation by approximately four times while maintaining or improving image quality.
Contribution
LADR introduces a novel, training-free approach leveraging spatial properties to significantly speed up diffusion models for text-to-image tasks.
Findings
Achieves about 4x inference speedup over baselines.
Maintains or improves image quality, especially in spatial reasoning.
Effective across four benchmark datasets.
Abstract
Discrete Diffusion Language Models have emerged as a compelling paradigm for unified multimodal generation, yet their deployment is hindered by high inference latency arising from iterative decoding. Existing acceleration strategies often require expensive re-training or fail to leverage the 2D spatial redundancy inherent in visual data. To address this, we propose Locality-Aware Dynamic Rescue (LADR), a training-free method that expedites inference by exploiting the spatial Markov property of images. LADR prioritizes the recovery of tokens at the ''generation frontier'', regions spatially adjacent to observed pixels, thereby maximizing information gain. Specifically, our method integrates morphological neighbor identification to locate candidate tokens, employs a risk-bounded filtering mechanism to prevent error propagation, and utilizes manifold-consistent inverse scheduling to align…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
