CoDAR: Continuous Diffusion Language Models are More Powerful Than You Think
Junzhe Shen, Jieru Zhao, Ziwei He, Zhouhan Lin

TL;DR
This paper introduces CoDAR, a novel two-stage framework for continuous diffusion language models that enhances generation quality by addressing token rounding bottlenecks through a context-aware autoregressive decoder.
Contribution
The paper proposes CoDAR, a new approach that maintains continuous diffusion in embedding space and employs a contextually conditioned discretizer, improving performance over existing latent diffusion models.
Findings
CoDAR significantly outperforms latent diffusion models in quality.
It achieves competitive results with strong discrete diffusion language models.
A simple decoder temperature controls fluency and diversity trade-offs.
Abstract
We study why continuous diffusion language models (DLMs) have lagged behind discrete diffusion approaches despite their appealing continuous generative dynamics. Under a controlled token--recovery study, we identify token rounding, the final projection from denoised embeddings to tokens, as a primary bottleneck. Building on these insights, we propose CoDAR (Continuous Diffusion with Contextual AutoRegressive Decoder), a two--stage framework that keeps diffusion entirely continuous in an embedding space while learning a strong, context--conditional discretizer: an autoregressive Transformer decoder that cross--attends to the denoised embedding sequence and performs contextualized rounding to tokens. Experiments on LM1B and OpenWebText demonstrate that CoDAR substantially improves generation quality over latent diffusion and becomes competitive with strong discrete DLMs, while exposing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Topic Modeling · Generative Adversarial Networks and Image Synthesis
