Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models
Jonathan Williams, Esin Tureci

TL;DR
This paper introduces RLTT, a reinforcement learning framework that rewards the entire latent reasoning trajectory in LoopLMs, significantly enhancing their performance on mathematical reasoning benchmarks without external verifiers.
Contribution
RLTT provides dense, trajectory-level reward assignment for LoopLMs, addressing the mismatch in traditional RL objectives and improving reasoning accuracy substantially.
Findings
RLTT improves accuracy by +14.4% on MATH-500
RLTT enhances performance by +16.6% on AIME24
RLTT transfers effectively to non-mathematical reasoning tasks
Abstract
Looped Language Models (LoopLMs) perform multi-step latent reasoning prior to token generation and outperform conventional LLMs on reasoning benchmarks at smaller parameter budgets. However, attempts to further improve LoopLM reasoning with reinforcement learning have failed - standard objectives such as Group Relative Policy Optimization (GRPO) only assign credit to the final latent state, creating a fundamental mismatch with the model's internal computation. To resolve this, we introduce RLTT (Reward Latent Thought Trajectories), a reinforcement learning framework which distributes reward across the full latent reasoning trajectory. RLTT provides dense, trajectory-level credit assignment without relying on external verifiers and can directly replace GRPO with negligible overhead. Across extensive experiments with Ouro-2.6B-Thinking under identical training and inference conditions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare
