AlphaMath Almost Zero: Process Supervision without Process

Guoxin Chen; Minpeng Liao; Chengxi Li; Kai Fan

arXiv:2405.03553·cs.CL·September 30, 2024·6 cites

AlphaMath Almost Zero: Process Supervision without Process

Guoxin Chen, Minpeng Liao, Chengxi Li, Kai Fan

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

AlphaMath introduces a process supervision method for mathematical reasoning in LLMs that uses Monte Carlo Tree Search and a value model, eliminating the need for costly human or GPT-4 annotations, and achieves state-of-the-art results.

Contribution

It presents a novel framework that enables LLMs to improve mathematical reasoning autonomously without human or GPT-4 process annotations, using MCTS and a value model.

Findings

01

Achieves comparable or superior results to state-of-the-art methods.

02

Operates effectively without human or GPT-4 process supervision.

03

Demonstrates strong performance on in-domain and out-of-domain datasets.

Abstract

Although recent advancements in large language models (LLMs) have significantly improved their performance on various tasks, they still face challenges with complex and symbolic multi-step reasoning, particularly in mathematical reasoning. To bolster the mathematical reasoning capabilities of LLMs, most existing efforts concentrate on seeking assistance from either domain experts or GPT-4 for high-quality process-supervised data, which is not only expensive but also labor-intensive. In our study, we propose an innovative framework, AlphaMath, that bypasses the need for process annotations (from humans or GPTs) by leveraging Monte Carlo Tree Search (MCTS). This framework focuses on unleashing the potential of a well-pretrained LLM to autonomously enhance its mathematical reasoning. Specifically, we integrate a value model with the LLM, automatically generating both process supervision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MARIO-Math-Reasoning/Super_MARIO
noneOfficial

Models

🤗
MARIO-Math-Reasoning/SVPO_7B
model· 6 dl· ♡ 4
6 dl♡ 4

Datasets

MARIO-Math-Reasoning/AlphaMath-Trainset
dataset· 53 dl
53 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings · Dropout