Breaking AR's Sampling Bottleneck: Provable Acceleration via Diffusion Language Models
Gen Li, Changxiao Cai

TL;DR
This paper provides a theoretical analysis demonstrating that diffusion language models can generate high-quality text samples with fewer than the sequence length in iterations, breaking the traditional autoregressive sampling bottleneck.
Contribution
It develops convergence guarantees for diffusion language models, showing they can outperform AR models in sampling efficiency under an information-theoretic framework.
Findings
Sampling error decays inversely with iterations T
Error scales linearly with mutual information between tokens
High-quality samples achievable with fewer than L iterations
Abstract
Diffusion models have emerged as a powerful paradigm for modern generative modeling, demonstrating strong potential for large language models (LLMs). Unlike conventional autoregressive (AR) models that generate tokens sequentially, diffusion models allow for parallel sampling, offering a promising path to accelerate generation and eliminate the left-to-right generation constraints. Despite their empirical success, theoretical understandings of diffusion language models remain underdeveloped. In this work, we develop convergence guarantees for diffusion language models from an information-theoretic perspective. Our analysis demonstrates that the sampling error, measured by the Kullback-Leibler (KL) divergence, decays inversely with the number of iterations and scales linearly with the mutual information between tokens in the target text sequence. Crucially, our theory covers the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution
MethodsDiffusion
