Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning
Earl J St Sauver

TL;DR
This paper introduces plan conditioning, a training-free method that prepends a natural-language plan to diffusion language models, significantly improving multi-step reasoning performance and stability across tasks like GSM8K and HumanEval.
Contribution
It proposes a novel plan conditioning technique that enhances diffusion language models' reasoning by providing a global context, addressing the coordination problem between autoregressive and diffusion models.
Findings
Plan conditioning improves GSM8K accuracy from 75.6% to 87.2%.
It boosts HumanEval performance by +12.8 percentage points.
Diffusion models benefit 2-10x more from plans than autoregressive models.
Abstract
Diffusion large language models (dLLMs) generate text via iterative denoising but consistently underperform on multi-step reasoning. We hypothesize this gap stems from a coordination problem: AR models build coherence token-by-token, while diffusion models must coordinate all positions simultaneously. We propose plan conditioning, a training-free method that prepends a short (~100-token) natural-language plan from an AR model to the diffusion model's prompt. The plan serves as a frozen scaffold -- globally visible context that every token position can attend to from the first denoising step. On GSM8K, plan conditioning improves LLaDA-8B-Instruct from 75.6% to 87.2% (+11.6 percentage points), matching a same-size AR model (LLaMA 3.1 8B, 87.7%) despite a 6.4pp weaker baseline. On HumanEval, the gain is +12.8pp (37.2% to 50.0%), showing plans generalize to code. The same plans improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
