First Proof

Mohammed Abouzaid; Andrew J. Blumberg; Martin Hairer; Joe Kileel; Tamara G. Kolda; Paul D. Nelson; Daniel Spielman; Nikhil Srivastava; Rachel Ward; Shmuel Weinberger; and Lauren Williams

arXiv:2602.05192·cs.AI·March 17, 2026

First Proof

Mohammed Abouzaid, Andrew J. Blumberg, Martin Hairer, Joe Kileel, Tamara G. Kolda, Paul D. Nelson, Daniel Spielman, Nikhil Srivastava, Rachel Ward, Shmuel Weinberger, and Lauren Williams

PDF

Open Access

TL;DR

This paper introduces a set of ten research-level mathematics questions to evaluate current AI systems' problem-solving capabilities, providing a new benchmark for assessing AI performance in advanced mathematics.

Contribution

It presents a novel, publicly shared set of research-level math questions that serve as a benchmark for testing AI systems' mathematical reasoning.

Findings

01

Questions are challenging for current AI systems

02

Provides a new benchmark for AI mathematical reasoning

03

Encourages development of more advanced AI in mathematics

Abstract

To assess the ability of current AI systems to correctly answer research-level mathematics questions, we share a set of ten math questions which have arisen naturally in the research process of the authors. The questions had not been shared publicly until now; the answers are known to the authors of the questions but will remain encrypted for a short time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · History and Theory of Mathematics · Computability, Logic, AI Algorithms