Theoretical Benefit and Limitation of Diffusion Language Model
Guhao Feng, Yihan Geng, Jian Guan, Wei Wu, Liwei Wang, Di He

TL;DR
This paper provides a theoretical analysis of Masked Diffusion Models for text generation, revealing their efficiency depends on the evaluation metric and highlighting limitations in achieving correctness for longer sequences.
Contribution
It offers the first theoretical framework for understanding the benefits and limitations of diffusion language models, especially MDMs, based on different evaluation metrics.
Findings
MDMs can achieve near-optimal perplexity efficiently.
Sequence error rate requires linear scaling of steps with sequence length.
Efficiency of MDMs is metric-dependent.
Abstract
Diffusion language models have emerged as a promising approach for text generation. One would naturally expect this method to be an efficient replacement for autoregressive models since multiple tokens can be sampled in parallel during each diffusion step. However, its efficiency-accuracy trade-off is not yet well understood. In this paper, we present a rigorous theoretical analysis of a widely used type of diffusion language model, the Masked Diffusion Model (MDM), and find that its effectiveness heavily depends on the target evaluation metric. Under mild conditions, we prove that when using perplexity as the metric, MDMs can achieve near-optimal perplexity in sampling steps regardless of sequence length, demonstrating that efficiency can be achieved without sacrificing performance. However, when using the sequence error rate--which is important for understanding the "correctness" of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsDiffusion
