Esoteric Language Models: Bridging Autoregressive and Masked Diffusion LLMs
Subham Sekhar Sahoo, Zhihan Yang, Yash Akhauri, Johnna Liu, Deepansha Singh, Zhoujun Cheng, Zhengzhong Liu, Eric Xing, John Thickstun, Arash Vahdat

TL;DR
This paper introduces Eso-LMs, a novel family of language models that combine autoregressive and masked diffusion approaches, enabling faster, more efficient, and controllable text generation with state-of-the-art speed-quality trade-offs.
Contribution
We propose Eso-LMs, which integrate AR and MDM paradigms using causal attention, allowing exact likelihood computation and KV caching for MDMs, thus improving inference efficiency.
Findings
Achieved new state-of-the-art speed-quality Pareto frontier for unconditional generation.
14-65x faster inference on long contexts compared to standard MDMs.
3-4x faster inference than prior semi-autoregressive models.
Abstract
Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Within this family, Masked Diffusion Models (MDMs) currently perform best but still underperform AR models in perplexity and lack key inference-time efficiency features, most notably KV caching. We introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, smoothly interpolating between their perplexities while overcoming their respective limitations. Unlike prior work, which uses transformers with bidirectional attention as MDM denoisers, we exploit the connection between MDMs and Any-Order autoregressive models and adopt causal attention. This design lets us compute the exact likelihood of MDMs for the first time and, crucially, enables us \to introduce KV caching for MDMs while preserving parallel generation for the first time,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedia, Religion, Digital Communication · Islamic Finance and Banking Studies
