WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning

Haojin Yang; Rui Hu; Zequn Sun; Rui Zhou; Yujun Cai; Yiwei Wang

arXiv:2511.19473·cs.LG·March 3, 2026

WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning

Haojin Yang, Rui Hu, Zequn Sun, Rui Zhou, Yujun Cai, Yiwei Wang

PDF

Open Access 3 Reviews

TL;DR

WavefrontDiffusion introduces a dynamic decoding schedule that adaptively expands token contexts during generation, improving reasoning and semantic coherence in diffusion language models without increasing computational costs.

Contribution

It proposes WavefrontDiffusion, a novel adaptive decoding strategy that enhances reasoning and semantic coherence in diffusion language models.

Findings

01

Achieves state-of-the-art results on reasoning and code generation benchmarks.

02

Produces outputs with higher semantic fidelity.

03

Maintains computational cost comparable to block-based methods.

Abstract

Diffusion Language Models (DLMs) have shown strong potential for text generation and are becoming a competitive alternative to autoregressive models. The denoising strategy plays an important role in determining the quality of their outputs. Mainstream denoising strategies include Standard Diffusion and BlockDiffusion. Standard Diffusion performs global denoising without restricting the update range, often finalizing incomplete context and causing premature end-of-sequence predictions. BlockDiffusion updates fixed-size blocks in a preset order, but its rigid structure can break apart coherent semantic units and disrupt reasoning. We present WavefrontDiffusion, a dynamic decoding approach that expands a wavefront of active tokens outward from finalized positions. This adaptive process follows the natural flow of semantic structure while keeping computational cost equal to block-based…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. WavefrontDiffusion dynamically adjusts the denoising process to follow the evolving semantic structure, preventing premature or fragmented token generation. 2. By expanding from finalized tokens outward, it ensures each token is generated with sufficient context, leading to smoother and more logically consistent outputs. 3. The method matches the computational cost of block-based decoding while delivering higher accuracy and better output quality.

Weaknesses

1. The paper lacks a baseline for its method. It needs to compare its approach with current decoding methods in the DLM field to demonstrate its advantages. 2. Experiments were only conducted on one model category. Similar experiments need to be performed on Dream for comparison.

Reviewer 02Rating 4Confidence 4

Strengths

1. The method is intuitive and addresses the limitations of hard boundaries in BlockDiffusion. 2. The research methodology is well-structured, with clear explanations of the wavefront theory, a four-step algorithm, and mathematical definitions. The experimental design covers four benchmark tests and evaluates the method using multiple metrics such as accuracy, BERTScore, and the MHCO indicator. 3. The experimental analysis is thorough and provides insights into parameter selection.

Weaknesses

1. The method is an incremental improvement over BlockDiffusion; both the methodology and the experimental results are incremental in nature. 2. Regarding the writing, Figure 1 is not sufficiently intuitive and requires further revision.

Reviewer 03Rating 4Confidence 4

Strengths

1. The method avoids premature EOS and half-baked spans by not locking in locally high confidence tokens too early. 2. It completes semantically “ready” regions first (e.g., function signatures, reasoning steps) and is not hostage to rigid chunk boundaries.

Weaknesses

1. Only F and R are studied; there is no analysis of the per-step finalize quota k_t, nor strict equal FLOPs / equal token updates controls. 2. The setup is mostly zero-shot with T=1024 and temperature 0; it lacks length/temperature sweeps and multi-seed variance. 3. The very long context engineering story is unclear; the overhead of frontier maintenance and cache policies at extreme lengths is not evidenced. 4. Autoregressive baselines at matched latency are missing; there is no head-to-head ag

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare