Scaling Categorical Flow Maps

Oscar Davis; Anastasiia Filippova; Pierre Ablin; Victor Turrisi; Amitis Shidani; Marco Cuturi; Louis B\'ethune

arXiv:2605.07820·cs.LG·May 12, 2026

Scaling Categorical Flow Maps

Oscar Davis, Anastasiia Filippova, Pierre Ablin, Victor Turrisi, Amitis Shidani, Marco Cuturi, Louis B\'ethune

PDF

TL;DR

This paper demonstrates the scalability of Categorical Flow Maps (CFMs) for large language models, achieving high-quality text generation with fewer inference steps and providing new insights into their training and evaluation.

Contribution

It trains a 1.7B-parameter CFM on 2.1T tokens, introduces a likelihood bound for CFMs, and offers practical guidance for scaling these models.

Findings

01

Achieves high-quality text generation in as few as 4 steps.

02

Introduces a likelihood bound for semi-discrete CFMs.

03

Provides insights on training challenges and optimization strategies.

Abstract

Continuous diffusion and flow matching models could represent a powerful alternative to autoregressive approaches for language modelling (LM), as they unlock a host of advantages currently reserved for continuous modalities, including accelerated sampling and tilting. Recently, several works have demonstrated the possibility of generating discrete data continuously by a simple flow matching process between a Gaussian and the one-hot encoded data distribution. They have further shown the feasibility of accelerated sampling via Categorical Flow Maps (CFMs), resulting in competitive sample quality in the few-step regime. However, this method had only been evaluated at relatively modest scales ( $< 1$ B), leaving the question of its scalability completely open. In this article, we train a $1.7$ B-parameter base flow model on $2.1$ T tokens and self-distill it into a CFM that generates diverse,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.