Scaling Latent Reasoning via Looped Language Models
Rui-Jie Zhu, Zixuan Wang, Kai Hua, Tianyu Zhang, Ziniu Li, Haoran Que, Boyi Wei, Zixin Wen, Fan Yin, He Xing, Lu Li, Jiajun Shi, Kaijing Ma, Shanda Li, Taylor Kergan, Andrew Smith, Xingwei Qu, Mude Hui, Bohong Wu, Qiyang Min, Hongzhi Huang, Xun Zhou, Wei Ye, Jiaheng Liu

TL;DR
Ouro introduces a family of pre-trained LoopLM models that embed reasoning into the pre-training process through iterative latent computation, achieving superior reasoning and manipulation capabilities compared to traditional chain-of-thought methods.
Contribution
This work presents Ouro, a novel pre-trained Looped Language Model that incorporates reasoning during training via iterative latent computation and entropy regularization, scaling to 7.7T tokens.
Findings
Ouro models outperform comparable SOTA LLMs on various benchmarks.
Superior reasoning capabilities are due to knowledge manipulation, not increased capacity.
LoopLM produces reasoning traces more aligned with final outputs than explicit CoT.
Abstract
Modern LLMs are trained to "think" primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks. Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities. We also show that LoopLM yields reasoning traces more aligned with final outputs than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ByteDance/Ouro-1.4Bmodel· 24k dl· ♡ 8024k dl♡ 80
- 🤗ByteDance/Ouro-2.6B-Thinkingmodel· 6.1k dl· ♡ 1026.1k dl♡ 102
- 🤗ByteDance/Ouro-1.4B-Thinkingmodel· 3.4k dl· ♡ 333.4k dl♡ 33
- 🤗ByteDance/Ouro-2.6Bmodel· 2.8k dl· ♡ 692.8k dl♡ 69
- 🤗scpalmetto/Ouro-2.6B-Thinking-Fixedmodel· 925 dl· ♡ 2925 dl♡ 2
- 🤗KristianS7/Ouro-1.4Bmodel· 193 dl193 dl
- 🤗KristianS7/Ouro-1.4B-Thinkingmodel· 296 dl296 dl
- 🤗lettersandpatterns/Ouro-1.4B-Thinking-patchedmodel· 281 dl281 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
