TL;DR
The paper introduces $\\mathbb{S}$-FLM, a hyperspherical flow language model that improves sequence generation efficiency and semantic interpretability over traditional flow models, especially in reasoning tasks.
Contribution
It proposes a novel latent FLM in hyperspherical space that generates sequences via rotations, reducing computational overhead and enhancing performance in reasoning tasks.
Findings
$\\mathbb{S}$-FLM improves large-vocabulary reasoning performance.
It closes the gap to masked diffusion models at standard temperature.
It remains less effective at low-temperature decoding.
Abstract
Discrete Diffusion Language Models progressed rapidly as an alternative to autoregressive (AR) models, motivated by their parallel generation abilities. However, for tractability, discrete diffusion models sample from a factorized distribution, which is less expressive than AR. Recent Flow Language Models (FLMs) apply continuous flows to language, transporting noise to data with a deterministic ODE that avoids factorized sampling. FLMs operate on one-hot vectors whose dimension scales with the vocabulary size, making FLMs costly to train. Moreover, since all distinct one-hot embeddings are equidistant in , adding Gaussian noise does not have a clear semantic interpretation (unlike images, where Gaussian noise progressively degrades structure). We introduce -FLM, a latent FLM in the hypersphere. -FLM generates sequences by rotating vectors in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
