Scaling Categorical Flow Maps
Oscar Davis, Anastasiia Filippova, Pierre Ablin, Victor Turrisi, Amitis Shidani, Marco Cuturi, Louis B\'ethune

TL;DR
This paper demonstrates the scalability of Categorical Flow Maps (CFMs) for large language models, achieving high-quality text generation with fewer inference steps and providing new insights into their training and evaluation.
Contribution
It trains a 1.7B-parameter CFM on 2.1T tokens, introduces a likelihood bound for CFMs, and offers practical guidance for scaling these models.
Findings
Achieves high-quality text generation in as few as 4 steps.
Introduces a likelihood bound for semi-discrete CFMs.
Provides insights on training challenges and optimization strategies.
Abstract
Continuous diffusion and flow matching models could represent a powerful alternative to autoregressive approaches for language modelling (LM), as they unlock a host of advantages currently reserved for continuous modalities, including accelerated sampling and tilting. Recently, several works have demonstrated the possibility of generating discrete data continuously by a simple flow matching process between a Gaussian and the one-hot encoded data distribution. They have further shown the feasibility of accelerated sampling via Categorical Flow Maps (CFMs), resulting in competitive sample quality in the few-step regime. However, this method had only been evaluated at relatively modest scales (B), leaving the question of its scalability completely open. In this article, we train a B-parameter base flow model on T tokens and self-distill it into a CFM that generates diverse,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
