No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models
Zachary Horvitz, Raghav Singhal, Hao Zou, Carles Domingo-Enrich, Zhou Yu, Rajesh Ranganath, Kathleen McKeown

TL;DR
This paper introduces reasoning-as-infilling and multi-token entropy decoding for masked diffusion language models, improving reasoning, scoring, and sampling efficiency in tasks like math and coding.
Contribution
It proposes novel inference and training methods for MDLMs, enhancing reasoning, uncertainty estimation, and efficiency over traditional decoding approaches.
Findings
Fine-tuning on posterior reasoning traces boosts performance.
Reasoning-as-infilling enables scoring intermediate reasoning steps.
MED reduces decoding steps by 2.7x while maintaining accuracy.
Abstract
Masked diffusion language models (MDLMs) are trained to in-fill positions in randomly masked sequences, in contrast to next-token prediction models. Discussions around MDLMs focus on two benefits: (1) any-order decoding and 2) multi-token decoding. However, we observe that for math and coding tasks, any-order algorithms often underperform or behave similarly to left-to-right sampling, and standard multi-token decoding significantly degrades performance. At inference time, MDLMs compute the conditional distribution of all masked positions. A natural question is: How can we justify this additional compute when left-to-right one-token-at-a-time decoding is on par with any-order decoding algorithms? First, we propose reasoning-as-infilling. By using MDLMs to infill a reasoning template, we can structure outputs and distinguish between reasoning and answer tokens. In turn, this enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Language and cultural evolution · Natural Language Processing Techniques
