Thinking Out of Order: When Output Order Stops Reflecting Reasoning Order in Diffusion Language Models
Longxuan Yu, Yu Fu, Shaorong Zhang, Hui Liu, Mukund Varma T, Greg Ver Steeg, and Yue Dong

TL;DR
This paper demonstrates that masked diffusion language models can generate outputs in any order, maintaining accuracy even when answers are required before reasoning, unlike traditional autoregressive models.
Contribution
It introduces the concept of order robustness in diffusion models and a new benchmark, ReasonOrderQA, to evaluate output order flexibility in reasoning tasks.
Findings
MDLMs show stable accuracy when output order is changed
AR models suffer large accuracy drops with non-standard output orders
MDLMs stabilize reasoning tokens earlier than answers during generation
Abstract
Autoregressive (AR) language models enforce a fixed left-to-right generation order, creating a fundamental limitation when the required output structure conflicts with natural reasoning (e.g., producing answers before explanations due to presentation or schema constraints). In such cases, AR models must commit to answers before generating intermediate reasoning, and this rigid constraint forces premature commitment. Masked diffusion language models (MDLMs), which iteratively refine all tokens in parallel, offer a way to decouple computation order from output structure. We validate this capability on GSM8K, Math500, and ReasonOrderQA, a benchmark we introduce with controlled difficulty and order-level evaluation. When prompts request answers before reasoning, AR models exhibit large accuracy gaps compared to standard chain-of-thought ordering (up to 67% relative drop), while MDLMs remain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
