Your Latent Reasoning is Secretly Policy Improvement Operator

Arip Asadulaev; Rayan Banerjee; Fakhri Karray; Martin Takac

arXiv:2511.16886·cs.CL·February 6, 2026

Your Latent Reasoning is Secretly Policy Improvement Operator

Arip Asadulaev, Rayan Banerjee, Fakhri Karray, Martin Takac

PDF

Open Access

TL;DR

This paper reveals that latent reasoning in small models acts as a policy improvement operator, and introduces training schemes inspired by reinforcement learning to enhance recursive reasoning efficiency and effectiveness.

Contribution

It formalizes latent reasoning as a policy improvement process and applies RL-inspired training methods to reduce dead compute and improve recursive reasoning performance.

Findings

01

Reduced total forward passes by 18x while maintaining performance

02

Formalized latent reasoning as a classifier-free guidance and policy improvement algorithm

03

Provided insights into when recursive reasoning improves or hinders model performance

Abstract

Recently, small models with latent recursion have obtained promising results on complex reasoning tasks. These results are typically explained by the theory that such recursion increases a networks depth, allowing it to compactly emulate the capacity of larger models. However, the performance of recursively added layers remains behind the capabilities of one pass models with the same feed forward depth. This means that in the looped version, not every recursive step effectively contributes to depth. This raises the question: when and why does latent reasoning improve performance, and when does it result in dead compute? In our work, we analyze the algorithms that latent reasoning provides answer to this question. We show that latent reasoning can be formalized as a classifier free guidance and policy improvement algorithm. Building on these insights, we propose to use a training schemes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Machine Learning in Healthcare