$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

Zhenbang Du; Kejing Xia; Xinrui Zhong; Yonggan Fu; Nicolai Oswald; Binfei Ji; Brucek Khailany; Pavlo Molchanov; Yingyan Lin

arXiv:2604.18995·cs.CL·April 22, 2026

$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

Zhenbang Du, Kejing Xia, Xinrui Zhong, Yonggan Fu, Nicolai Oswald, Binfei Ji, Brucek Khailany, Pavlo Molchanov, Yingyan Lin

PDF

TL;DR

This paper introduces $R^2$-dLLM, a framework that significantly reduces decoding redundancy in diffusion large language models, leading to up to 75% fewer decoding steps and improved efficiency without sacrificing quality.

Contribution

The paper proposes a unified approach combining inference-time rules and supervised fine-tuning to reduce spatial and temporal decoding redundancy in dLLMs.

Findings

01

Reduces decoding steps by up to 75%

02

Maintains competitive generation quality

03

Validates decoding redundancy as a key efficiency bottleneck

Abstract

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to autoregressive generation by enabling parallel token prediction. However, practical dLLM decoding still suffers from high inference latency, which limits deployment. In this work, we observe that a substantial part of this inefficiency comes from recurring redundancy in the decoding process, including spatial redundancy caused by confidence clusters and positional ambiguity, and temporal redundancy caused by repeatedly remasking predictions that have already stabilized. Motivated by these patterns, we propose $R^{2}$ -dLLM, a unified framework for reducing decoding redundancy from both inference and training perspectives. At inference time, we introduce training-free decoding rules that aggregate local confidence and token predictions, and finalize temporally stable tokens to avoid redundant decoding steps.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.