Reasoning with Latent Tokens in Diffusion Language Models

Andre He; Sean Welleck; Daniel Fried

arXiv:2602.03769·cs.LG·February 4, 2026

Reasoning with Latent Tokens in Diffusion Language Models

Andre He, Sean Welleck, Daniel Fried

PDF

Open Access

TL;DR

This paper explores how latent tokens in diffusion language models can be used to balance inference speed and quality, and introduces methods to incorporate them into autoregressive models for improved reasoning capabilities.

Contribution

It introduces a novel approach to modulate latent tokens in diffusion models and demonstrates their integration into autoregressive models to enhance reasoning performance.

Findings

01

Latent tokens enable a smooth tradeoff between inference speed and sample quality.

02

Modulating latent tokens improves reasoning task performance.

03

Introducing latent tokens into autoregressive models yields substantial improvements.

Abstract

Discrete diffusion models have recently become competitive with autoregressive models for language modeling, even outperforming them on reasoning tasks requiring planning and global coherence, but they require more computation at inference time. We trace this trade-off to a key mechanism: diffusion models are trained to jointly predict a distribution over all unknown tokens, including those that will not actually be decoded in the current step. Ablating this joint prediction yields faster inference but degrades performance, revealing that accurate prediction at the decoded position relies on joint reasoning about the distribution of undecoded tokens. We interpret these as latent tokens and introduce a method for modulating their number, demonstrating empirically that this enables a smooth tradeoff between inference speed and sample quality. Furthermore, we demonstrate that latent tokens…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications