Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning

Earl J St Sauver

arXiv:2603.13243·cs.AI·March 17, 2026

Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning

Earl J St Sauver

PDF

Open Access

TL;DR

This paper introduces plan conditioning, a training-free method that prepends a natural-language plan to diffusion language models, significantly improving multi-step reasoning performance and stability across tasks like GSM8K and HumanEval.

Contribution

It proposes a novel plan conditioning technique that enhances diffusion language models' reasoning by providing a global context, addressing the coordination problem between autoregressive and diffusion models.

Findings

01

Plan conditioning improves GSM8K accuracy from 75.6% to 87.2%.

02

It boosts HumanEval performance by +12.8 percentage points.

03

Diffusion models benefit 2-10x more from plans than autoregressive models.

Abstract

Diffusion large language models (dLLMs) generate text via iterative denoising but consistently underperform on multi-step reasoning. We hypothesize this gap stems from a coordination problem: AR models build coherence token-by-token, while diffusion models must coordinate all positions simultaneously. We propose plan conditioning, a training-free method that prepends a short (~100-token) natural-language plan from an AR model to the diffusion model's prompt. The plan serves as a frozen scaffold -- globally visible context that every token position can attend to from the first denoising step. On GSM8K, plan conditioning improves LLaDA-8B-Instruct from 75.6% to 87.2% (+11.6 percentage points), matching a same-size AR model (LLaMA 3.1 8B, 87.7%) despite a 6.4pp weaker baseline. On HumanEval, the gain is +12.8pp (37.2% to 50.0%), showing plans generalize to code. The same plans improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques