LADR: Locality-Aware Dynamic Rescue for Efficient Text-to-Image Generation with Diffusion Large Language Models

Chenglin Wang; Yucheng Zhou; Shawn Chen; Tao Wang; Kai Zhang

arXiv:2603.13450·cs.CV·April 13, 2026

LADR: Locality-Aware Dynamic Rescue for Efficient Text-to-Image Generation with Diffusion Large Language Models

Chenglin Wang, Yucheng Zhou, Shawn Chen, Tao Wang, Kai Zhang

PDF

TL;DR

LADR is a training-free, spatially-aware method that accelerates diffusion-based text-to-image generation by approximately four times while maintaining or improving image quality.

Contribution

LADR introduces a novel, training-free approach leveraging spatial properties to significantly speed up diffusion models for text-to-image tasks.

Findings

01

Achieves about 4x inference speedup over baselines.

02

Maintains or improves image quality, especially in spatial reasoning.

03

Effective across four benchmark datasets.

Abstract

Discrete Diffusion Language Models have emerged as a compelling paradigm for unified multimodal generation, yet their deployment is hindered by high inference latency arising from iterative decoding. Existing acceleration strategies often require expensive re-training or fail to leverage the 2D spatial redundancy inherent in visual data. To address this, we propose Locality-Aware Dynamic Rescue (LADR), a training-free method that expedites inference by exploiting the spatial Markov property of images. LADR prioritizes the recovery of tokens at the ''generation frontier'', regions spatially adjacent to observed pixels, thereby maximizing information gain. Specifically, our method integrates morphological neighbor identification to locate candidate tokens, employs a risk-bounded filtering mechanism to prevent error propagation, and utilizes manifold-consistent inverse scheduling to align…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.