Your Latent Reasoning is Secretly Policy Improvement Operator
Arip Asadulaev, Rayan Banerjee, Fakhri Karray, Martin Takac

TL;DR
This paper reveals that latent reasoning in small models acts as a policy improvement operator, and introduces training schemes inspired by reinforcement learning to enhance recursive reasoning efficiency and effectiveness.
Contribution
It formalizes latent reasoning as a policy improvement process and applies RL-inspired training methods to reduce dead compute and improve recursive reasoning performance.
Findings
Reduced total forward passes by 18x while maintaining performance
Formalized latent reasoning as a classifier-free guidance and policy improvement algorithm
Provided insights into when recursive reasoning improves or hinders model performance
Abstract
Recently, small models with latent recursion have obtained promising results on complex reasoning tasks. These results are typically explained by the theory that such recursion increases a networks depth, allowing it to compactly emulate the capacity of larger models. However, the performance of recursively added layers remains behind the capabilities of one pass models with the same feed forward depth. This means that in the looped version, not every recursive step effectively contributes to depth. This raises the question: when and why does latent reasoning improve performance, and when does it result in dead compute? In our work, we analyze the algorithms that latent reasoning provides answer to this question. We show that latent reasoning can be formalized as a classifier free guidance and policy improvement algorithm. Building on these insights, we propose to use a training schemes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Machine Learning in Healthcare
