Adaptation to Intrinsic Dependence in Diffusion Language Models
Yunxiao Zhao, Changxiao Cai

TL;DR
This paper introduces a distribution-agnostic, adaptive unmasking schedule for diffusion language models that improves sampling efficiency by accounting for the data's intrinsic dependence structure, with theoretical convergence guarantees.
Contribution
It proposes a novel randomized unmasking schedule that adapts to data dependence without prior knowledge, providing improved convergence guarantees for diffusion language models.
Findings
Convergence rates scale with total correlation measures of data dependence.
The method accelerates sampling for low-complexity distributions.
Guarantees hold in parallel-sampling regimes, enhancing practical applicability.
Abstract
Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive (AR) approaches, enabling parallel token generation beyond a rigid left-to-right order. Despite growing empirical success, the theoretical understanding of how unmasking schedules -- which specify the order and size of unmasked tokens during sampling -- affect generation quality remains limited. In this work, we introduce a distribution-agnostic unmasking schedule for DLMs that adapts to the (unknown) dependence structure of the target data distribution, without requiring any prior knowledge or hyperparameter tuning. In contrast to prior deterministic procedures that fix unmasking sizes, our method randomizes the number of tokens revealed at each iteration. We show that, for two specific parameter choices, the sampling convergence guarantees -- measured by Kullback-Leibler (KL) divergence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques
