The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Zanlin Ni; Shenzhi Wang; Yang Yue; Tianyu Yu; Weilin Zhao; Yeguo Hua; Tianyi Chen; Jun Song; Cheng Yu; Bo Zheng; Gao Huang

arXiv:2601.15165·cs.CL·March 20, 2026

The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Zanlin Ni, Shenzhi Wang, Yang Yue, Tianyu Yu, Weilin Zhao, Yeguo Hua, Tianyi Chen, Jun Song, Cheng Yu, Bo Zheng, Gao Huang

PDF

Open Access 3 Models

TL;DR

This paper challenges the assumption that arbitrary order generation in diffusion language models enhances reasoning, showing it often narrows reasoning capabilities and proposing a simpler, more effective training approach called JustGRPO.

Contribution

The paper reveals that arbitrary order generation in dLLMs can limit reasoning and introduces JustGRPO, a straightforward training method that improves reasoning without sacrificing parallel decoding.

Findings

01

Arbitrary order generation tends to bypass high-uncertainty tokens, reducing reasoning potential.

02

JustGRPO achieves 89.1% accuracy on GSM8K, outperforming some existing methods.

03

Simpler training with JustGRPO is more effective than complex RL approaches for reasoning in dLLMs.

Abstract

Diffusion Large Language Models (dLLMs) break the rigid left-to-right constraint of traditional LLMs, enabling token generation in arbitrary orders. Intuitively, this flexibility implies a solution space that strictly supersets the fixed autoregressive trajectory, theoretically unlocking superior reasoning potential for general tasks like mathematics and coding. Consequently, numerous works have leveraged reinforcement learning (RL) to elicit the reasoning capability of dLLMs. In this paper, we reveal a counter-intuitive reality: arbitrary order generation, in its current form, narrows rather than expands the reasoning boundary of dLLMs. We find that dLLMs tend to exploit this order flexibility to bypass high-uncertainty tokens that are crucial for exploration, leading to a premature collapse of the solution space. This observation motivates a rethink of RL approaches for dLLMs, where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques