LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

Yuxin Chen; Chumeng Liang; Hangke Sui; Ruihan Guo; Chaoran Cheng; Jiaxuan You; Ge Liu

arXiv:2604.11748·cs.CL·April 16, 2026

LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

Yuxin Chen, Chumeng Liang, Hangke Sui, Ruihan Guo, Chaoran Cheng, Jiaxuan You, Ge Liu

PDF

1 Repo 2 Models

TL;DR

LangFlow introduces a continuous diffusion language model that rivals discrete models by connecting embedding-space DLMs to Flow Matching, with novel evaluation bounds, noise scheduling, and training protocols.

Contribution

It is the first continuous DLM to match discrete diffusion models in language modeling, demonstrating competitive performance and novel training and evaluation methods.

Findings

01

LangFlow achieves perplexity of 30.0 on LM1B and 24.6 on OpenWebText.

02

It exceeds autoregressive baselines in zero-shot transfer on 4 out of 7 benchmarks.

03

The model rivals top discrete DLMs in both likelihood and generative perplexity.

Abstract

Continuous diffusion has been the foundation of high-fidelity, controllable, and few-step generation of many data modalities such as images. However, in language modeling, prior continuous diffusion language models (DLMs) lag behind discrete counterparts due to the sparse data space and the underexplored design space. In this work, we close this gap with LangFlow, the first continuous DLM to rival discrete diffusion, by connecting embedding-space DLMs to Flow Matching via Bregman divergence, alongside three key innovations: (1) we derive a novel ODE-based NLL bound for principled evaluation of continuous flow-based language models; (2) we propose an information-uniform principle for setting the noise schedule, which motivates a learnable noise scheduler based on a Gumbel distribution; and (3) we revise prior training protocols by incorporating self-conditioning, as we find it improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nealchen2003/LangFlow
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.