S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

Ligong Han; Hao Wang; Han Gao; Kai Xu; Akash Srivastava

arXiv:2603.25702·cs.CL·March 27, 2026

S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

Ligong Han, Hao Wang, Han Gao, Kai Xu, Akash Srivastava

PDF

Open Access

TL;DR

S2D2 introduces a training-free self-speculative decoding method for block-diffusion language models, enhancing speed and accuracy without extra training or test-time compute by combining diffusion with autoregressive verification.

Contribution

It proposes a novel hybrid decoding framework that uses the pretrained model as both drafter and verifier, improving speed and accuracy in block-diffusion models without additional training.

Findings

01

Up to 4.7× speedup over autoregressive decoding on SDAR

02

Up to 1.57× speedup over tuned dynamic decoding baseline

03

Accuracy improved by up to 4.5 points

Abstract

Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressive thresholds hurt quality, while conservative thresholds require unnecessary denoising steps. Existing approaches that address this issue either require additional training or incur extra test-time compute. We present S2D2, a training-free self-speculative decoding framework for block-diffusion language models. Our key observation is that a block-diffusion model becomes autoregressive when the block size is reduced to one, allowing the same pretrained model to act as both drafter and verifier. S2D2 inserts a speculative verification step into standard block-diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Speech Recognition and Synthesis