Improving Sampling for Masked Diffusion Models via Information Gain
Kaisen Yang, Jayden Teoh, Kaicheng Yang, Yitong Zhang, Alex Lamb

TL;DR
This paper introduces the Info-Gain Sampler, a new decoding method for Masked Diffusion Models that considers future uncertainty and information gain, significantly improving performance across various tasks.
Contribution
It proposes a novel decoding framework that balances immediate uncertainty with future information gain, addressing limitations of greedy heuristics in MDMs.
Findings
Achieves 3.6% higher accuracy on reasoning tasks
Attains 63.1% win-rate in creative writing
Reduces cumulative uncertainty from 78.4 to 48.6
Abstract
Masked Diffusion Models (MDMs) offer greater flexibility in decoding order than autoregressive models but require careful planning to achieve high-quality generation. Existing samplers typically adopt greedy heuristics, prioritizing positions with the highest local certainty to decode at each step. Through failure case analysis, we identify a fundamental limitation of this approach: it neglects the downstream impact of current decoding choices on subsequent steps and fails to minimize cumulative uncertainty. In particular, these methods do not fully exploit the non-causal nature of MDMs, which enables evaluating how a decoding decision reshapes token probabilities/uncertainty across all remaining masked positions. To bridge this gap, we propose the Info-Gain Sampler, a principled decoding framework that balances immediate uncertainty with information gain over future masked tokens.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications
