Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models

Jonathan Williams; Esin Tureci

arXiv:2602.10520·cs.LG·February 13, 2026

Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models

Jonathan Williams, Esin Tureci

PDF

Open Access

TL;DR

This paper introduces RLTT, a reinforcement learning framework that rewards the entire latent reasoning trajectory in LoopLMs, significantly enhancing their performance on mathematical reasoning benchmarks without external verifiers.

Contribution

RLTT provides dense, trajectory-level reward assignment for LoopLMs, addressing the mismatch in traditional RL objectives and improving reasoning accuracy substantially.

Findings

01

RLTT improves accuracy by +14.4% on MATH-500

02

RLTT enhances performance by +16.6% on AIME24

03

RLTT transfers effectively to non-mathematical reasoning tasks

Abstract

Looped Language Models (LoopLMs) perform multi-step latent reasoning prior to token generation and outperform conventional LLMs on reasoning benchmarks at smaller parameter budgets. However, attempts to further improve LoopLM reasoning with reinforcement learning have failed - standard objectives such as Group Relative Policy Optimization (GRPO) only assign credit to the final latent state, creating a fundamental mismatch with the model's internal computation. To resolve this, we introduce RLTT (Reward Latent Thought Trajectories), a reinforcement learning framework which distributes reward across the full latent reasoning trajectory. RLTT provides dense, trajectory-level credit assignment without relying on external verifiers and can directly replace GRPO with negligible overhead. Across extensive experiments with Ouro-2.6B-Thinking under identical training and inference conditions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare