HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification
Erik Y. Wang, Sumeet Motwani, James V. Roggeveen, Eliot Hodges, Dulhan Jayalath, Charles London, Kalyan Ramakrishnan, Flaviu Cipcigan, Philip Torr, Alessandro Abate

TL;DR
HorizonMath is a new benchmark with over 100 unsolved mathematical problems designed to evaluate AI's ability to make novel discoveries, using automated verification to identify potential breakthroughs by large language models like GPT-5.4.
Contribution
It introduces a scalable, open-source platform for assessing AI-driven mathematical discovery on unsolved problems, avoiding data contamination and manual review.
Findings
GPT 5.4 Pro proposed solutions to two problems, improving on known results.
HorizonMath enables scalable, automated evaluation of AI's mathematical reasoning.
Potential for AI solutions to contribute novel results to mathematics.
Abstract
Can AI make progress on important, unsolved mathematical problems? Large language models are now capable of sophisticated mathematical and scientific reasoning, but whether they can perform novel research is still widely debated and underexplored. We introduce HorizonMath, a benchmark of over 100 predominantly unsolved problems spanning 8 domains in computational and applied mathematics, paired with an open-source evaluation framework for automated verification. Our benchmark targets a class of problems where discovery is hard, requiring meaningful mathematical insight, but verification is computationally efficient and simple. Because these solutions are unknown, HorizonMath is immune to data contamination, and most state-of-the-art models score near 0%. Existing research-level benchmarks instead rely on formal proof verification or manual review, both of which are expensive to scale.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Machine Learning in Materials Science · Model Reduction and Neural Networks
