PutnamBench: Evaluating Neural Theorem-Provers on the Putnam   Mathematical Competition

George Tsoukalas; Jasper Lee; John Jennings; Jimmy Xin; Michelle Ding,; Michael Jennings; Amitayush Thakur; Swarat Chaudhuri

arXiv:2407.11214·cs.AI·November 5, 2024·1 cites

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding,, Michael Jennings, Amitayush Thakur, Swarat Chaudhuri

PDF

Open Access 2 Repos 5 Datasets 1 Video

TL;DR

PutnamBench is a challenging new benchmark with formalized competition math problems in multiple proof assistants, designed to evaluate neural theorem-provers' capabilities on undergraduate-level mathematics.

Contribution

It introduces a comprehensive, multi-language benchmark of formalized competition problems, providing a new standard for evaluating neural theorem-provers.

Findings

01

Existing neural theorem-provers solve only a few problems

02

PutnamBench is a difficult open challenge for AI research

03

The benchmark covers broad undergraduate mathematics topics

Abstract

We present PutnamBench, a new multi-language benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1692 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. PutnamBench requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition· slideslive

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks