Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models
Liancheng Fang, Aiwei Liu, Henry Peng Zou, Yankai Chen, Enze Ma, Leyi Pan, Chunyu Miao, Wei-Chieh Huang, Xue Liu, Philip S. Yu

TL;DR
This paper investigates the tradeoff between quality and exploration in diffusion language models, proposing a new sampling method that improves reasoning performance by balancing these aspects.
Contribution
It introduces a theoretical framework for the quality-exploration dilemma and develops an Independent Metropolis–Hastings sampler to better balance them during decoding.
Findings
The proposed method outperforms random and low-confidence remasking in reasoning benchmarks.
Low-confidence remasking constrains sequence entropy, limiting exploration.
The new sampler achieves a better exploration-quality tradeoff.
Abstract
Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@), creating a fundamental quality--exploration dilemma. In this paper, we provide a unified explanation of this dilemma. We show that low-confidence remasking improves a myopic proxy for quality while provably constraining the entropy of the induced sequence distribution. To overcome this limitation, we characterize the optimal distribution that explicitly balances quality and exploration, and develop a simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
