Discrete Stochastic Localization for Non-autoregressive Generation

Yunshu Wu; Jiayi Cheng; Longxuan Yu; Partha Thakuria; Rob Brekelmans; Evangelos E. Papalexakis; Greg Ver Steeg

arXiv:2602.16169·cs.LG·May 22, 2026

Discrete Stochastic Localization for Non-autoregressive Generation

Yunshu Wu, Jiayi Cheng, Longxuan Yu, Partha Thakuria, Rob Brekelmans, Evangelos E. Papalexakis, Greg Ver Steeg

PDF

TL;DR

This paper introduces Discrete Stochastic Localization (DSL), a novel framework for discrete sequence generation that improves upon masked discrete diffusion models by supporting flexible SNR paths and enhancing distributional faithfulness.

Contribution

The paper proposes DSL, a continuous-state framework with invariant denoising, enabling a single trained model to support multiple SNR paths and improve discrete sequence generation.

Findings

01

Significantly improves distributional faithfulness on OpenWebText.

02

Supports multiple sampling methods including autoregressive and hybrid approaches.

03

Achieves high-quality generation with fewer steps without retraining.

Abstract

Continuous diffusion is a natural framework for non-autoregressive generation but has generally lagged behind masked discrete diffusion models (MDMs) on discrete sequence generation. We argue that the bottleneck is not continuity itself, but a representation in which denoising depends on timestep-indexed noise regimes. We introduce \emph{Discrete Stochastic Localization} (DSL), a continuous-state framework with unit-sphere token embeddings whose Bayes-optimal denoiser is invariant to the nominal signal-to-noise ratio (SNR) under the localization channel. One trained network then supports an entire family of per-token SNR paths, with endpoint masked-diffusion paths as a special case. Fine-tuning a pretrained MDLM checkpoint with DSL substantially improves distributional faithfulness (MAUVE) on OpenWebText across all step budgets from $T = 128$ to $T = 1024$ , and the same checkpoint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Speech Recognition and Synthesis · Natural Language Processing Techniques