DOS: Dependency-Oriented Sampler for Masked Diffusion Language Models

Xueyu Zhou; Yangrong Hu; and Jian Huang

arXiv:2603.15340·cs.CL·March 17, 2026

DOS: Dependency-Oriented Sampler for Masked Diffusion Language Models

Xueyu Zhou, Yangrong Hu, and Jian Huang

PDF

Open Access

TL;DR

This paper introduces DOS, a new decoding strategy for masked diffusion language models that uses inter-token dependencies from attention matrices to improve generation quality and efficiency.

Contribution

The paper presents DOS, a training-free, dependency-oriented sampling method that leverages transformer attention to enhance decoding in MDLMs, addressing limitations of token-level uncertainty approaches.

Findings

01

DOS outperforms existing decoding strategies on code generation tasks.

02

DOS improves mathematical reasoning task performance.

03

DOS enhances generation efficiency when combined with parallel sampling methods.

Abstract

Masked diffusion language models (MDLMs) have recently emerged as a new paradigm in language modeling, offering flexible generation dynamics and enabling efficient parallel decoding. However, existing decoding strategies for pre-trained MDLMs predominantly rely on token-level uncertainty criteria, while largely overlooking sequence-level information and inter-token dependencies. To address this limitation, we propose Dependency-Oriented Sampler (DOS), a training-free decoding strategy that leverages inter-token dependencies to inform token updates during generation. Specifically, DOS exploits attention matrices from transformer blocks to approximate inter-token dependencies, emphasizing information from unmasked tokens when updating masked positions. Empirical results demonstrate that DOS consistently achieves superior performance on both code generation and mathematical reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Natural Language Processing Techniques