LoopRPT: Reinforcement Pre-Training for Looped Language Models
Guo Tang, Shixin Jiang, Heng Chang, Nuo Chen, Yuhan Li, Huiming Fan, Jia Li, Ming Liu, Bing Qin

TL;DR
LoopRPT introduces a reinforcement pre-training method for LoopLMs that directly optimizes latent reasoning steps, leading to improved reasoning efficiency and accuracy, especially on challenging tokens.
Contribution
This work presents a novel reinforcement pre-training framework tailored for LoopLMs, enabling direct shaping of intermediate representations and more efficient latent reasoning.
Findings
LoopRPT improves per-step representation quality across model scales.
Significant gains on hard tokens indicate enhanced early-stage reasoning.
LoopRPT achieves better accuracy-computation trade-offs, demonstrating efficiency.
Abstract
Looped language models (LoopLMs) perform iterative latent computation to refine internal representations, offering a promising alternative to explicit chain-of-thought (CoT) reasoning. However, existing reinforcement learning (RL) paradigms primarily target output tokens, creating a structural mismatch with looped architectures whose reasoning unfolds implicitly. In this work, we propose LoopRPT, a reinforcement pre-training framework tailored for LoopLMs. By reframing next-token prediction as a next-token reasoning task, LoopRPT assigns reinforcement signals directly to latent steps using an EMA teacher reference and noisy latent rollouts. This formulation enables RL to directly shape intermediate representations, compressing effective reasoning into fewer iterations. We instantiate LoopRPT on the Ouro architecture across multiple model scales. Results demonstrate that LoopRPT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare
