AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Zihan Liu, Yang Chen, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

TL;DR
AceMath introduces advanced math reasoning models with specialized training and reward systems, significantly improving performance on complex math problems and setting new benchmarks in math reasoning AI.
Contribution
The paper presents a novel instruction-tuning process and a comprehensive reward model for math reasoning, outperforming existing models and establishing new evaluation benchmarks.
Findings
AceMath-72B-Instruct outperforms GPT-4o and Claude-3.5 Sonnet in math reasoning.
AceMath-72B-RM surpasses state-of-the-art reward models.
Combining AceMath models yields the highest rm@8 scores across benchmarks.
Abstract
In this paper, we introduce AceMath, a suite of frontier math models that excel in solving complex math problems, along with highly effective reward models capable of evaluating generated solutions and reliably identifying the correct ones. To develop the instruction-tuned math models, we propose a supervised fine-tuning (SFT) process that first achieves competitive performance across general domains, followed by targeted fine-tuning for the math domain using a carefully curated set of prompts and synthetically generated responses. The resulting model, AceMath-72B-Instruct greatly outperforms Qwen2.5-Math-72B-Instruct, GPT-4o and Claude-3.5 Sonnet. To develop math-specialized reward model, we first construct AceMath-RewardBench, a comprehensive and robust benchmark for evaluating math reward models across diverse problems and difficulty levels. After that, we present a systematic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nvidia/AceMath-1.5B-Instructmodel· 2.0k dl· ♡ 152.0k dl♡ 15
- 🤗nvidia/AceMath-7B-Instructmodel· 469 dl· ♡ 31469 dl♡ 31
- 🤗nvidia/AceMath-72B-Instructmodel· 756 dl· ♡ 20756 dl♡ 20
- 🤗nvidia/AceMath-72B-RMmodel· 855 dl· ♡ 9855 dl♡ 9
- 🤗nvidia/AceMath-7B-RMmodel· 1.8k dl· ♡ 61.8k dl♡ 6
- 🤗nvidia/AceInstruct-1.5Bmodel· 1.2k dl· ♡ 201.2k dl♡ 20
- 🤗nvidia/AceInstruct-7Bmodel· 1.7k dl· ♡ 211.7k dl♡ 21
- 🤗nvidia/AceInstruct-72Bmodel· 355 dl· ♡ 17355 dl♡ 17
- 🤗inarikami/AceMath-72B-Instruct-GGUFmodel· 17 dl17 dl
- 🤗redponike/AceMath-72B-Instruct-GGUFmodel· 20 dl· ♡ 120 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
MethodsSparse Evolutionary Training
