Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models

Jiyeon Kim; Sungik Choi; Yongrae Jo; Moontae Lee; Minjoon Seo

arXiv:2604.10567·cs.CL·April 14, 2026

Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models

Jiyeon Kim, Sungik Choi, Yongrae Jo, Moontae Lee, Minjoon Seo

PDF

TL;DR

This paper analyzes the inference dynamics of non-autoregressive diffusion language models, revealing a proximity bias that influences initial token decisions and proposing a minimal intervention method to improve reasoning and planning tasks.

Contribution

The study identifies a proximity bias in non-autoregressive diffusion models and introduces a simple guiding approach to enhance their performance on complex tasks.

Findings

01

Proximity bias causes local error propagation in diffusion models.

02

Early token decisions critically influence the entire decoding trajectory.

03

The proposed method improves performance without added computational cost.

Abstract

Diffusion-based language models (dLLMs) have emerged as a promising alternative to autoregressive language models, offering the potential for parallel token generation and bidirectional context modeling. However, harnessing this flexibility for fully non-autoregressive decoding remains an open question, particularly for reasoning and planning tasks. In this work, we investigate non-autoregressive decoding in dLLMs by systematically analyzing its inference dynamics along the temporal axis. Specifically, we uncover an inherent failure mode in confidence-based non-autoregressive generation stemming from a strong proximity bias-the tendency for the denoising order to concentrate on spatially adjacent tokens. This local dependency leads to spatial error propagation, rendering the entire trajectory critically contingent on the initial unmasking position. Leveraging this insight, we present a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.