Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Haolin Chen, Yihao Feng, Zuxin Liu, Weiran Yao, Akshara Prabhakar,, Shelby Heinecke, Ricky Ho, Phil Mui, Silvio Savarese, Caiming Xiong, Huan, Wang

TL;DR
This paper introduces LaTRO, a novel framework that enhances large language models' reasoning abilities by optimizing latent reasoning distributions during training, leading to significant improvements in complex reasoning tasks without external feedback.
Contribution
LaTRO is a new variational-based training method that unlocks and improves latent reasoning capabilities in pre-trained LLMs without external reward models.
Findings
LaTRO improves GSM8K zero-shot accuracy by 12.5%.
LaTRO enhances reasoning across multiple model architectures.
The approach enables self-improvement of reasoning in LLMs.
Abstract
Large language models (LLMs) have shown impressive capabilities, but still struggle with complex reasoning tasks requiring multiple steps. While prompt-based methods like Chain-of-Thought (CoT) can improve LLM reasoning at inference time, optimizing reasoning capabilities during training remains challenging. We introduce LaTent Reasoning Optimization (LaTRO), a principled framework that formulates reasoning as sampling from a latent distribution and optimizes it via variational approaches. LaTRO enables LLMs to concurrently improve both their reasoning process and ability to evaluate reasoning quality, without requiring external feedback or reward models. We validate LaTRO through experiments on GSM8K and ARC-Challenge datasets using multiple model architectures. On GSM8K, LaTRO improves zero-shot accuracy by an average of 12.5% over base models and 9.6% over supervised fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsBalanced Selection
