TL;DR
Cola DLM introduces a hierarchical latent diffusion approach for text generation, separating global semantic organization from local textual realization, enabling more flexible and scalable language modeling beyond traditional autoregressive methods.
Contribution
This work presents a novel hierarchical latent diffusion language model that unifies semantic compression, prior fitting, and text generation in continuous space, outperforming autoregressive baselines.
Findings
Strong scaling behavior up to 2000 EFLOPs confirms model efficiency.
Hierarchical latent prior modeling improves generation quality over likelihood-based methods.
The approach supports extensions to other continuous modalities.
Abstract
Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation efficiency, scalable representation learning, and effective global semantic modeling. We propose Cola DLM, a hierarchical latent diffusion language model that frames text generation through hierarchical information decomposition. Cola DLM first learns a stable text-to-latent mapping with a Text VAE, then models a global semantic prior in continuous latent space with a block-causal DiT, and finally generates text through conditional decoding. From a unified Markov-path perspective, its diffusion process performs latent prior transport rather than token-level observation recovery, thereby separating global semantic organization from local textual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
