Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Zhenting Qi, Mingyuan Ma, Jiahang Xu, Li Lyna Zhang, Fan Yang, Mao, Yang

TL;DR
The paper presents rStar, a mutual reasoning approach that enhances small language models' problem-solving abilities by using self-play and mutual verification, significantly improving their reasoning accuracy without additional fine-tuning.
Contribution
Introducing rStar, a novel self-play mutual reasoning method that boosts small language models' reasoning performance through a decoupled generation and verification process.
Findings
rStar significantly improves reasoning accuracy across multiple SLMs.
GSM8K accuracy for LLaMA2-7B increases from 12.51% to 63.91%.
Code implementation will be publicly available.
Abstract
This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. rStar decouples reasoning into a self-play mutual generation-discrimination process. First, a target SLM augments the Monte Carlo Tree Search (MCTS) with a rich set of human-like reasoning actions to construct higher quality reasoning trajectories. Next, another SLM, with capabilities similar to the target SLM, acts as a discriminator to verify each trajectory generated by the target SLM. The mutually agreed reasoning trajectories are considered mutual consistent, thus are more likely to be correct. Extensive experiments across five SLMs demonstrate rStar can effectively solve diverse reasoning problems, including GSM8K, GSM-Hard, MATH, SVAMP, and StrategyQA. Remarkably, rStar boosts GSM8K accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
MethodsSparse Evolutionary Training
