Simple Denoising Diffusion Language Models

Huaisheng Zhu; Zhengyu Chen; Shijie Zhou; Zhihui Xie; Yige Yuan; Shiqi Chen; Zhimeng Guo; Siyuan Xu; Hangfan Zhang; Vasant Honavar; Teng Xiao

arXiv:2510.22926·cs.LG·February 4, 2026

Simple Denoising Diffusion Language Models

Huaisheng Zhu, Zhengyu Chen, Shijie Zhou, Zhihui Xie, Yige Yuan, Shiqi Chen, Zhimeng Guo, Siyuan Xu, Hangfan Zhang, Vasant Honavar, Teng Xiao

PDF

TL;DR

This paper introduces a simplified denoising loss for Uniform State Diffusion Models that stabilizes training and enhances performance, making large-scale text generation more efficient and scalable.

Contribution

It proposes a streamlined denoising loss and regularization technique for USDMs, reducing complexity and computational overhead while maintaining high performance.

Findings

01

Simplified loss stabilizes training and matches prior performance.

02

Regularization improves output distribution quality.

03

Method scales effectively to larger models.

Abstract

Recent Uniform State Diffusion Models (USDMs), initialized from a uniform prior, offer the promise of fast text generation due to their inherent self-correction ability compared to masked diffusion models. However, they still rely on complex loss formulations with additional computational overhead, which hinders scalability. In this work, we explore a simplified denoising-based loss for USDMs that optimizes only noise-replaced tokens, stabilizing training while matching the performance of prior methods with more complex objectives. In addition, we introduce an efficient regularization term to mitigate corruption toward uniform output distributions, which further improves performance. We demonstrate the effectiveness and efficiency of our simple and improved loss formulations by pretraining models on widely used text datasets for USDMs. More importantly, our conclusions scale to larger…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.