LoopRPT: Reinforcement Pre-Training for Looped Language Models

Guo Tang; Shixin Jiang; Heng Chang; Nuo Chen; Yuhan Li; Huiming Fan; Jia Li; Ming Liu; Bing Qin

arXiv:2603.19714·cs.CL·March 23, 2026

LoopRPT: Reinforcement Pre-Training for Looped Language Models

Guo Tang, Shixin Jiang, Heng Chang, Nuo Chen, Yuhan Li, Huiming Fan, Jia Li, Ming Liu, Bing Qin

PDF

Open Access

TL;DR

LoopRPT introduces a reinforcement pre-training method for LoopLMs that directly optimizes latent reasoning steps, leading to improved reasoning efficiency and accuracy, especially on challenging tokens.

Contribution

This work presents a novel reinforcement pre-training framework tailored for LoopLMs, enabling direct shaping of intermediate representations and more efficient latent reasoning.

Findings

01

LoopRPT improves per-step representation quality across model scales.

02

Significant gains on hard tokens indicate enhanced early-stage reasoning.

03

LoopRPT achieves better accuracy-computation trade-offs, demonstrating efficiency.

Abstract

Looped language models (LoopLMs) perform iterative latent computation to refine internal representations, offering a promising alternative to explicit chain-of-thought (CoT) reasoning. However, existing reinforcement learning (RL) paradigms primarily target output tokens, creating a structural mismatch with looped architectures whose reasoning unfolds implicitly. In this work, we propose LoopRPT, a reinforcement pre-training framework tailored for LoopLMs. By reframing next-token prediction as a next-token reasoning task, LoopRPT assigns reinforcement signals directly to latent steps using an EMA teacher reference and noisy latent rollouts. This formulation enables RL to directly shape intermediate representations, compressing effective reasoning into fewer iterations. We instantiate LoopRPT on the Ouro architecture across multiple model scales. Results demonstrate that LoopRPT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare