Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Zhenting Qi; Mingyuan Ma; Jiahang Xu; Li Lyna Zhang; Fan Yang; Mao; Yang

arXiv:2408.06195·cs.CL·August 13, 2024·3 cites

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Zhenting Qi, Mingyuan Ma, Jiahang Xu, Li Lyna Zhang, Fan Yang, Mao, Yang

PDF

Open Access 3 Repos

TL;DR

The paper presents rStar, a mutual reasoning approach that enhances small language models' problem-solving abilities by using self-play and mutual verification, significantly improving their reasoning accuracy without additional fine-tuning.

Contribution

Introducing rStar, a novel self-play mutual reasoning method that boosts small language models' reasoning performance through a decoupled generation and verification process.

Findings

01

rStar significantly improves reasoning accuracy across multiple SLMs.

02

GSM8K accuracy for LLaMA2-7B increases from 12.51% to 63.91%.

03

Code implementation will be publicly available.

Abstract

This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. rStar decouples reasoning into a self-play mutual generation-discrimination process. First, a target SLM augments the Monte Carlo Tree Search (MCTS) with a rich set of human-like reasoning actions to construct higher quality reasoning trajectories. Next, another SLM, with capabilities similar to the target SLM, acts as a discriminator to verify each trajectory generated by the target SLM. The mutually agreed reasoning trajectories are considered mutual consistent, thus are more likely to be correct. Extensive experiments across five SLMs demonstrate rStar can effectively solve diverse reasoning problems, including GSM8K, GSM-Hard, MATH, SVAMP, and StrategyQA. Remarkably, rStar boosts GSM8K accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security

MethodsSparse Evolutionary Training