TL;DR
This paper introduces a continuous flow-based language model that outperforms discrete diffusion models in quality and speed, enabling efficient few-step inference for large-scale language generation.
Contribution
It presents a novel continuous flow formulation for language modeling, demonstrating superior performance and efficiency over discrete diffusion methods, and introduces the flow map language model (FMLM).
Findings
FLM matches state-of-the-art diffusion baselines on LM1B and OWT datasets.
FMLM's one-step generation surpasses recent few-step diffusion models.
Continuous flows can effectively model discrete language data without discrete noising.
Abstract
Language models based on discrete diffusion have attracted widespread interest for their potential to provide faster generation than autoregressive models. Despite their promise, these models typically produce samples whose quality sharply degrades in the few-step regime, preventing a dramatic speedup in practice. Here, we show that language models based on continuous flows over one-hot token embeddings can outperform discrete diffusion in both quality and speed. Importantly, our continuous formulation defines a unique flow map that can be learned directly for efficient few-step inference, a structure we show is unavailable to discrete methods. In this setting, we show that both the flow and its associated flow map can be learned with simple cross-entropy objectives that respect the simplex geometry of the data, and we identify three distinct choices for flow map distillation whose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
