Lookahead Unmasking Elicits Accurate Decoding in Diffusion Language Models
Sanghyun Lee, Seungryong Kim, Jongho Park, Dongmin Park

TL;DR
This paper introduces Lookahead Unmasking (LookUM), a novel method for diffusion language models that improves decoding accuracy by optimizing unmasking paths through uncertainty-based verification, achieving state-of-the-art results efficiently.
Contribution
The paper presents LookUM, a new path selection framework for diffusion language models that leverages uncertainty to improve decoding without external rewards.
Findings
LookUM improves performance across six benchmarks.
Only two to three paths are needed for peak performance.
LLaDA with LookUM rivals RL-tuned LLaDA 1.5.
Abstract
Masked Diffusion Models (MDMs) as language models generate by iteratively unmasking tokens, yet their performance crucially depends on the inference time order of unmasking. Prevailing heuristics, such as confidence based sampling, are myopic: they optimize locally, fail to leverage extra test-time compute, and let early decoding mistakes cascade. We propose Lookahead Unmasking (LookUM), which addresses these concerns by reformulating sampling as path selection over all possible unmasking orders without the need for an external reward model. Our framework couples (i) a path generator that proposes paths by sampling from pools of unmasking sets with (ii) a verifier that computes the uncertainty of the proposed paths and performs importance sampling to subsequently select the final paths. Empirically, erroneous unmasking measurably inflates sequence level uncertainty, and our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
