DOS: Dependency-Oriented Sampler for Masked Diffusion Language Models
Xueyu Zhou, Yangrong Hu, and Jian Huang

TL;DR
This paper introduces DOS, a new decoding strategy for masked diffusion language models that uses inter-token dependencies from attention matrices to improve generation quality and efficiency.
Contribution
The paper presents DOS, a training-free, dependency-oriented sampling method that leverages transformer attention to enhance decoding in MDLMs, addressing limitations of token-level uncertainty approaches.
Findings
DOS outperforms existing decoding strategies on code generation tasks.
DOS improves mathematical reasoning task performance.
DOS enhances generation efficiency when combined with parallel sampling methods.
Abstract
Masked diffusion language models (MDLMs) have recently emerged as a new paradigm in language modeling, offering flexible generation dynamics and enabling efficient parallel decoding. However, existing decoding strategies for pre-trained MDLMs predominantly rely on token-level uncertainty criteria, while largely overlooking sequence-level information and inter-token dependencies. To address this limitation, we propose Dependency-Oriented Sampler (DOS), a training-free decoding strategy that leverages inter-token dependencies to inform token updates during generation. Specifically, DOS exploits attention matrices from transformer blocks to approximate inter-token dependencies, emphasizing information from unmasked tokens when updating masked positions. Empirical results demonstrate that DOS consistently achieves superior performance on both code generation and mathematical reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Natural Language Processing Techniques
