AlphaMath Almost Zero: Process Supervision without Process
Guoxin Chen, Minpeng Liao, Chengxi Li, Kai Fan

TL;DR
AlphaMath introduces a process supervision method for mathematical reasoning in LLMs that uses Monte Carlo Tree Search and a value model, eliminating the need for costly human or GPT-4 annotations, and achieves state-of-the-art results.
Contribution
It presents a novel framework that enables LLMs to improve mathematical reasoning autonomously without human or GPT-4 process annotations, using MCTS and a value model.
Findings
Achieves comparable or superior results to state-of-the-art methods.
Operates effectively without human or GPT-4 process supervision.
Demonstrates strong performance on in-domain and out-of-domain datasets.
Abstract
Although recent advancements in large language models (LLMs) have significantly improved their performance on various tasks, they still face challenges with complex and symbolic multi-step reasoning, particularly in mathematical reasoning. To bolster the mathematical reasoning capabilities of LLMs, most existing efforts concentrate on seeking assistance from either domain experts or GPT-4 for high-quality process-supervised data, which is not only expensive but also labor-intensive. In our study, we propose an innovative framework, AlphaMath, that bypasses the need for process annotations (from humans or GPTs) by leveraging Monte Carlo Tree Search (MCTS). This framework focuses on unleashing the potential of a well-pretrained LLM to autonomously enhance its mathematical reasoning. Specifically, we integrate a value model with the LLM, automatically generating both process supervision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings · Dropout
