TL;DR
SONAR-LLM is a decoder-only transformer that generates text by thinking in sentence embeddings, combining the semantic abstraction of LCM with likelihood-based training, achieving competitive quality across various sizes.
Contribution
It introduces SONAR-LLM, a hybrid model that unifies sentence embedding thinking with token-level training, improving upon previous Large Concept Models.
Findings
Achieves competitive generation quality across model sizes from 39M to 1.3B parameters.
Retains semantic abstraction while using likelihood-based training.
Provides comprehensive benchmarks and reproducibility resources.
Abstract
The recently proposed Large Concept Model (LCM) generates text by predicting a sequence of sentence-level embeddings and training with either mean-squared error or diffusion objectives. We present SONAR-LLM, a decoder-only transformer that "thinks" in the same continuous SONAR embedding space, yet is supervised through token-level cross-entropy propagated via the frozen SONAR decoder. This hybrid objective retains the semantic abstraction of LCM while eliminating its diffusion sampler and restoring a likelihood-based training signal. Across model sizes from 39M to 1.3B parameters, SONAR-LLM attains competitive generation quality. We report scaling trends, ablations, benchmark results, and release the complete training code and all pretrained checkpoints to foster reproducibility and future research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
