Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning
Alex Ning, Yen-Ling Kuo, Gabe Gomes

TL;DR
This paper introduces an adaptive latent reasoning approach in Transformer models, optimizing reasoning length via reinforcement learning to reduce computation while maintaining accuracy.
Contribution
It develops a novel reinforcement learning method to adaptively determine latent reasoning length, improving efficiency and compressive reasoning capabilities in language models.
Findings
52% reduction in reasoning length without accuracy loss
Effective optimization of reasoning length via RL
Enhanced efficiency in latent reasoning models
Abstract
Latent reasoning represents a new development in Transformer language models that has shown potential in compressing reasoning lengths compared to chain-of-thought reasoning. By directly passing the information-rich previous final latent state into the next sequence, latent reasoning removes the restriction to human language tokens as the medium for reasoning. We develop adaptive-length latent reasoning models and introduce a post-SFT reinforcement-learning methodology to optimize latent reasoning length by minimizing reasoning length while maintaining accuracy. This, in turn, further reduces compute usage and raises the bar on the compressive capabilities of latent reasoning models. Experiments on the Llama 3.2 1B model and the GSM8K-Aug dataset show a drop in total reasoning length with no penalty to accuracy. In future work, we plan to extend to additional models and datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
