Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement   Learning in Language Models

Alon Albalak; Duy Phung; Nathan Lile; Rafael Rafailov; Kanishk Gandhi,; Louis Castricato; Anikait Singh; Chase Blagden; Violet Xiang; Dakota Mahan,; Nick Haber

arXiv:2502.17387·cs.LG·February 25, 2025

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Alon Albalak, Duy Phung, Nathan Lile, Rafael Rafailov, Kanishk Gandhi,, Louis Castricato, Anikait Singh, Chase Blagden, Violet Xiang, Dakota Mahan,, Nick Haber

PDF

Open Access 1 Repo 1 Models 4 Datasets

TL;DR

Big-Math is a large, high-quality math dataset with over 250,000 problems designed for reinforcement learning in language models, balancing quality and quantity to improve reasoning capabilities.

Contribution

The paper introduces Big-Math, a rigorously filtered and curated large-scale math dataset with verified solutions, including reformulated questions for diverse reasoning tasks.

Findings

01

Big-Math surpasses existing datasets in size and quality.

02

Contains diverse problem domains and difficulty levels.

03

Enables improved reasoning in language models.

Abstract

Increasing interest in reasoning models has led math to become a prominent testing ground for algorithmic and methodological improvements. However, existing open math datasets either contain a small collection of high-quality, human-written problems or a large corpus of machine-generated problems of uncertain quality, forcing researchers to choose between quality and quantity. In this work, we present Big-Math, a dataset of over 250,000 high-quality math questions with verifiable answers, purposefully made for reinforcement learning (RL). To create Big-Math, we rigorously filter, clean, and curate openly available datasets, extracting questions that satisfy our three desiderata: (1) problems with uniquely verifiable solutions, (2) problems that are open-ended, (3) and problems with a closed-form solution. To ensure the quality of Big-Math, we manually verify each step in our filtering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

synthlabsai/big-math
noneOfficial

Models

🤗
benchang1110/Qwen2.5-Taiwan-3B-Reason-GRPO
model· 12 dl· ♡ 1
12 dl♡ 1

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics