The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models
Zanlin Ni, Shenzhi Wang, Yang Yue, Tianyu Yu, Weilin Zhao, Yeguo Hua, Tianyi Chen, Jun Song, Cheng Yu, Bo Zheng, Gao Huang

TL;DR
This paper challenges the assumption that arbitrary order generation in diffusion language models enhances reasoning, showing it often narrows reasoning capabilities and proposing a simpler, more effective training approach called JustGRPO.
Contribution
The paper reveals that arbitrary order generation in dLLMs can limit reasoning and introduces JustGRPO, a straightforward training method that improves reasoning without sacrificing parallel decoding.
Findings
Arbitrary order generation tends to bypass high-uncertainty tokens, reducing reasoning potential.
JustGRPO achieves 89.1% accuracy on GSM8K, outperforming some existing methods.
Simpler training with JustGRPO is more effective than complex RL approaches for reasoning in dLLMs.
Abstract
Diffusion Large Language Models (dLLMs) break the rigid left-to-right constraint of traditional LLMs, enabling token generation in arbitrary orders. Intuitively, this flexibility implies a solution space that strictly supersets the fixed autoregressive trajectory, theoretically unlocking superior reasoning potential for general tasks like mathematics and coding. Consequently, numerous works have leveraged reinforcement learning (RL) to elicit the reasoning capability of dLLMs. In this paper, we reveal a counter-intuitive reality: arbitrary order generation, in its current form, narrows rather than expands the reasoning boundary of dLLMs. We find that dLLMs tend to exploit this order flexibility to bypass high-uncertainty tokens that are crucial for exploration, leading to a premature collapse of the solution space. This observation motivates a rethink of RL approaches for dLLMs, where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
