Measuring Mathematical Problem Solving With the MATH Dataset

Dan Hendrycks; Collin Burns; Saurav Kadavath; Akul Arora and; Steven Basart; Eric Tang; Dawn Song; Jacob Steinhardt

arXiv:2103.03874·cs.LG·November 10, 2021·275 cites

Measuring Mathematical Problem Solving With the MATH Dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora and, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

PDF

Open Access 5 Repos 10 Models 5 Datasets

TL;DR

This paper introduces the MATH dataset of 12,500 challenging math problems with solutions to evaluate and improve machine learning models' mathematical reasoning, highlighting current limitations despite scaling efforts.

Contribution

The paper presents the MATH dataset and an auxiliary pretraining dataset, providing new benchmarks and insights into the challenges of scaling models for mathematical problem solving.

Findings

01

Accuracy remains low even with large Transformer models.

02

Scaling models alone is insufficient for advanced mathematical reasoning.

03

New algorithmic approaches are needed beyond scaling Transformers.

Abstract

Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics. Even though we are able to increase accuracy on MATH, our results show that accuracy remains relatively low, even with enormous Transformer models. Moreover, we find that simply increasing budgets and model parameter counts will be impractical for achieving strong mathematical reasoning if scaling trends continue.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Attention Is All You Need · Dropout · Residual Connection · Adam · Byte Pair Encoding