Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion
Georgios Batzolis, Mark Girolami, Luca Ambrogioni

TL;DR
This paper introduces a novel continuous diffusion approach over binary bitstreams for language modeling, achieving state-of-the-art perplexity and efficiency, and bridging the gap with autoregressive models.
Contribution
It proposes modeling text as a continuous diffusion process over fixed-width binary bitstreams with entropy-gated stochastic sampling, improving performance and scalability.
Findings
Achieves a perplexity of 59.76 on LM1B with 130M parameters.
Establishes a new Pareto frontier on OpenWebText with fewer steps.
Removes vocabulary scaling bottleneck by predicting logit bits.
Abstract
Diffusion language models (DLMs) promise parallel, order-agnostic generation, but on standard benchmarks they have historically lagged behind autoregressive models in sample quality and diversity. Recent continuous flow and diffusion approaches over token embeddings have narrowed this gap, suggesting continuous state spaces are highly effective for language. In this work, we further close the autoregressive gap by modeling text as a continuous diffusion process over fixed-width binary bitstreams. Our approach represents semantic tokens as analog bit sequences and utilizes a matched-filter residual parameterization to isolate contextual learning from analytic independent-bit posteriors. Crucially, we adopt a stochastic sampler that applies Langevin-type corrections gated by the entropy-rate profile, automatically concentrating stochasticity in high-information regions while remaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
