Self-Explore: Enhancing Mathematical Reasoning in Language Models with   Fine-grained Rewards

Hyeonbin Hwang; Doyoung Kim; Seungone Kim; Seonghyeon Ye; Minjoon Seo

arXiv:2404.10346·cs.CL·October 4, 2024·1 cites

Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards

Hyeonbin Hwang, Doyoung Kim, Seungone Kim, Seonghyeon Ye, Minjoon Seo

PDF

Open Access 1 Repo

TL;DR

Self-Explore enables language models to self-improve their reasoning by identifying and learning from their first mistakes in rationales, leading to significant performance gains without needing human-annotated data.

Contribution

The paper introduces Self-Explore, a novel method where LLMs self-assess and refine their reasoning by focusing on their initial errors, reducing reliance on costly human rationales.

Findings

01

Achieves 11.57% improvement on GSM8K

02

Achieves 2.89% improvement on MATH

03

Demonstrates effective self-improvement in reasoning capabilities

Abstract

Training on large amounts of rationales (i.e., CoT Fine-tuning) is effective at improving the reasoning capabilities of large language models (LLMs). However, acquiring human-authored rationales or augmenting rationales from proprietary models is costly and not scalable. In this paper, we study the problem of whether LLMs could self-improve their reasoning capabilities. To this end, we propose Self-Explore, where the LLM is tasked to explore the first wrong step (i.e., the first pit) within the rationale and use such signals as fine-grained rewards for further improvement. On the GSM8K and MATH test set, Self-Explore achieves 11.57% and 2.89% improvement on average across three LLMs compared to supervised fine-tuning (SFT). Our code is available at https://github.com/hbin0701/Self-Explore.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hbin0701/Self-Explore
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques