Discrete Stochastic Localization for Non-autoregressive Generation
Yunshu Wu, Jiayi Cheng, Longxuan Yu, Partha Thakuria, Rob Brekelmans, Evangelos E. Papalexakis, Greg Ver Steeg

TL;DR
This paper introduces Discrete Stochastic Localization (DSL), a novel framework for discrete sequence generation that improves upon masked discrete diffusion models by supporting flexible SNR paths and enhancing distributional faithfulness.
Contribution
The paper proposes DSL, a continuous-state framework with invariant denoising, enabling a single trained model to support multiple SNR paths and improve discrete sequence generation.
Findings
Significantly improves distributional faithfulness on OpenWebText.
Supports multiple sampling methods including autoregressive and hybrid approaches.
Achieves high-quality generation with fewer steps without retraining.
Abstract
Continuous diffusion is a natural framework for non-autoregressive generation but has generally lagged behind masked discrete diffusion models (MDMs) on discrete sequence generation. We argue that the bottleneck is not continuity itself, but a representation in which denoising depends on timestep-indexed noise regimes. We introduce \emph{Discrete Stochastic Localization} (DSL), a continuous-state framework with unit-sphere token embeddings whose Bayes-optimal denoiser is invariant to the nominal signal-to-noise ratio (SNR) under the localization channel. One trained network then supports an entire family of per-token SNR paths, with endpoint masked-diffusion paths as a special case. Fine-tuning a pretrained MDLM checkpoint with DSL substantially improves distributional faithfulness (MAUVE) on OpenWebText across all step budgets from to , and the same checkpoint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Speech Recognition and Synthesis · Natural Language Processing Techniques
