Breaking AR's Sampling Bottleneck: Provable Acceleration via Diffusion Language Models

Gen Li; Changxiao Cai

arXiv:2505.21400·cs.LG·January 9, 2026

Breaking AR's Sampling Bottleneck: Provable Acceleration via Diffusion Language Models

Gen Li, Changxiao Cai

PDF

Open Access

TL;DR

This paper provides a theoretical analysis demonstrating that diffusion language models can generate high-quality text samples with fewer than the sequence length in iterations, breaking the traditional autoregressive sampling bottleneck.

Contribution

It develops convergence guarantees for diffusion language models, showing they can outperform AR models in sampling efficiency under an information-theoretic framework.

Findings

01

Sampling error decays inversely with iterations T

02

Error scales linearly with mutual information between tokens

03

High-quality samples achievable with fewer than L iterations

Abstract

Diffusion models have emerged as a powerful paradigm for modern generative modeling, demonstrating strong potential for large language models (LLMs). Unlike conventional autoregressive (AR) models that generate tokens sequentially, diffusion models allow for parallel sampling, offering a promising path to accelerate generation and eliminate the left-to-right generation constraints. Despite their empirical success, theoretical understandings of diffusion language models remain underdeveloped. In this work, we develop convergence guarantees for diffusion language models from an information-theoretic perspective. Our analysis demonstrates that the sampling error, measured by the Kullback-Leibler (KL) divergence, decays inversely with the number of iterations $T$ and scales linearly with the mutual information between tokens in the target text sequence. Crucially, our theory covers the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Language and cultural evolution

MethodsDiffusion