Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

Zhihan Yang; Wei Guo; Shuibai Zhang; Subham Sekhar Sahoo; Yongxin Chen; Arash Vahdat; Morteza Mardani; John Thickstun

arXiv:2605.18530·cs.CL·May 19, 2026

Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

Zhihan Yang, Wei Guo, Shuibai Zhang, Subham Sekhar Sahoo, Yongxin Chen, Arash Vahdat, Morteza Mardani, John Thickstun

PDF

TL;DR

This paper demonstrates that likelihood-trained continuous diffusion language models can be scaled competitively with discrete models, achieving state-of-the-art results and providing theoretical insights into their advantages.

Contribution

RePlaid, a likelihood-based continuous diffusion language model, is constructed to rival discrete models in scalability and performance, challenging previous beliefs about continuous diffusion limitations.

Findings

01

RePlaid achieves a compute gap of only 20x compared to autoregressive models.

02

RePlaid outperforms Duo with fewer parameters.

03

RePlaid sets a new state-of-the-art PPL of 22.1 among continuous DLMs on OpenWebText.

Abstract

While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous diffusion language model (DLM), and construct RePlaid by aligning the architecture of Plaid with modern discrete DLMs. In this unified setting, we establish the first scaling law for continuous DLMs that rivals discrete DLMs: RePlaid exhibits a compute gap of only $20 \times$ compared to autoregressive models, outperforms Duo while using fewer parameters, and outperforms MDLM in the over-trained regime. We benchmark RePlaid against recent continuous DLMs: on OpenWebText, RePlaid achieves a new state-of-the-art PPL bound of $22.1$ among continuous DLMs and superior generation quality. These results suggest that continuous diffusion, when trained via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.