Reasoning with Latent Tokens in Diffusion Language Models
Andre He, Sean Welleck, Daniel Fried

TL;DR
This paper explores how latent tokens in diffusion language models can be used to balance inference speed and quality, and introduces methods to incorporate them into autoregressive models for improved reasoning capabilities.
Contribution
It introduces a novel approach to modulate latent tokens in diffusion models and demonstrates their integration into autoregressive models to enhance reasoning performance.
Findings
Latent tokens enable a smooth tradeoff between inference speed and sample quality.
Modulating latent tokens improves reasoning task performance.
Introducing latent tokens into autoregressive models yields substantial improvements.
Abstract
Discrete diffusion models have recently become competitive with autoregressive models for language modeling, even outperforming them on reasoning tasks requiring planning and global coherence, but they require more computation at inference time. We trace this trade-off to a key mechanism: diffusion models are trained to jointly predict a distribution over all unknown tokens, including those that will not actually be decoded in the current step. Ablating this joint prediction yields faster inference but degrades performance, revealing that accurate prediction at the decoded position relies on joint reasoning about the distribution of undecoded tokens. We interpret these as latent tokens and introduce a method for modulating their number, demonstrating empirically that this enables a smooth tradeoff between inference speed and sample quality. Furthermore, we demonstrate that latent tokens…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
