Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Alon Albalak, Duy Phung, Nathan Lile, Rafael Rafailov, Kanishk Gandhi,, Louis Castricato, Anikait Singh, Chase Blagden, Violet Xiang, Dakota Mahan,, Nick Haber

TL;DR
Big-Math is a large, high-quality math dataset with over 250,000 problems designed for reinforcement learning in language models, balancing quality and quantity to improve reasoning capabilities.
Contribution
The paper introduces Big-Math, a rigorously filtered and curated large-scale math dataset with verified solutions, including reformulated questions for diverse reasoning tasks.
Findings
Big-Math surpasses existing datasets in size and quality.
Contains diverse problem domains and difficulty levels.
Enables improved reasoning in language models.
Abstract
Increasing interest in reasoning models has led math to become a prominent testing ground for algorithmic and methodological improvements. However, existing open math datasets either contain a small collection of high-quality, human-written problems or a large corpus of machine-generated problems of uncertain quality, forcing researchers to choose between quality and quantity. In this work, we present Big-Math, a dataset of over 250,000 high-quality math questions with verifiable answers, purposefully made for reinforcement learning (RL). To create Big-Math, we rigorously filter, clean, and curate openly available datasets, extracting questions that satisfy our three desiderata: (1) problems with uniquely verifiable solutions, (2) problems that are open-ended, (3) and problems with a closed-form solution. To ensure the quality of Big-Math, we manually verify each step in our filtering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
