A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning

Hiroshi Yoshihara; Taiki Yamaguchi; Yuichi Inoue

arXiv:2507.08267·cs.LG·July 14, 2025

A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning

Hiroshi Yoshihara, Taiki Yamaguchi, Yuichi Inoue

PDF

Open Access 1 Repo 3 Models 3 Datasets

TL;DR

This paper presents a practical two-stage training approach combining extended Supervised Fine-Tuning and Reinforcement Learning to significantly improve the accuracy and efficiency of mathematical reasoning in Large Language Models.

Contribution

It introduces a systematic methodology that effectively integrates SFT and RL, demonstrating substantial performance gains and efficiency improvements in mathematical reasoning tasks.

Findings

01

Extending SFT to 10 epochs boosts performance.

02

GRPO primarily reduces solution length while maintaining accuracy.

03

Achieved top-tier results on the AIMO benchmark.

Abstract

Enhancing the mathematical reasoning of Large Language Models (LLMs) is a pivotal challenge in advancing AI capabilities. While Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) are the dominant training paradigms, a systematic methodology for combining them to maximize both accuracy and efficiency remains largely unexplored. This paper introduces a practical and effective training recipe that strategically integrates extended SFT with RL from online inference (GRPO). We posit that these methods play complementary, not competing, roles: a prolonged SFT phase first pushes the model's accuracy to its limits, after which a GRPO phase dramatically improves token efficiency while preserving this peak performance. Our experiments reveal that extending SFT for as many as 10 epochs is crucial for performance breakthroughs, and that the primary role of GRPO in this framework is to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

analokmaus/kaggle-aimo2-fast-math-r1
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Mathematics, Computing, and Information Processing · Topic Modeling

MethodsShrink and Fine-Tune