TL;DR
This paper introduces DUS, a dilated unmasking scheduler for masked diffusion language models, enabling faster, parallel token unmasking with predictable speedup and improved performance across various benchmarks.
Contribution
The paper proposes a novel, inference-only scheduler that partitions sequence positions into dilated groups for parallel unmasking, improving speed and quality trade-offs in MDLMs.
Findings
DUS achieves up to 5.8x speedup over token-by-token decoding.
DUS outperforms confidence-based planners across multiple benchmarks.
Dilated spacing enhances adaptive samplers when used as a post-filter.
Abstract
Masked diffusion language models (MDLMs) promise fast, non-autoregressive text generation, yet existing samplers, which pick tokens to unmask based on model confidence, ignore interactions when unmasking multiple positions in parallel and effectively reduce to slow, autoregressive behavior. We propose the Dilated Unmasking Scheduler (DUS), an inference-only, planner-model-free method that partitions sequence positions into non-adjacent dilated groups and unmasks them in parallel so as to minimize an upper bound on joint entropy gain at each denoising step. By explicitly trading off the number of network calls against generation quality, DUS recovers most of the performance lost under traditional parallel unmasking strategies. Across math (GSM8K, MATH500), code (HumanEval, MBPP), general-knowledge (BBH, MMLU-Pro), and instruction following (IFEval) benchmarks, DUS outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
