Advancing Mathematics Research with AI-Driven Formal Proof Search

George Tsoukalas; Anton Kovsharov; Sergey Shirobokov; Anja Surina; Moritz Firsching; Gergely B\'erczi; Francisco J. R. Ruiz; Arun Suggala; Adam Zsolt Wagner; Eric Wieser; Lei Yu; Aja Huang; Mikl\'os Z. Horv\'ath; Andrew Ferrauiolo; Henryk Michalewski; Codrut Grosu; Thomas Hubert; Matej Balog; Pushmeet Kohli; Swarat Chaudhuri

arXiv:2605.22763·cs.AI·May 22, 2026

Advancing Mathematics Research with AI-Driven Formal Proof Search

George Tsoukalas, Anton Kovsharov, Sergey Shirobokov, Anja Surina, Moritz Firsching, Gergely B\'erczi, Francisco J. R. Ruiz, Arun Suggala, Adam Zsolt Wagner, Eric Wieser, Lei Yu, Aja Huang, Mikl\'os Z. Horv\'ath, Andrew Ferrauiolo, Henryk Michalewski, Codrut Grosu, Thomas Hubert

PDF

TL;DR

This paper evaluates AI-driven formal proof search using large language models in mathematics, demonstrating its potential to solve open problems across various fields with cost-effective methods.

Contribution

First large-scale assessment of LLMs generating formal proofs for open problems, highlighting effective agent designs and practical deployment in mathematical research.

Findings

01

Autonomous agent solved 9 Erdős problems out of 353

02

Proved 44 out of 492 OEIS conjectures

03

AI-aided proof search shows significant potential in mathematics

Abstract

Large language models (LLMs) increasingly excel at mathematical reasoning, but their unreliability limits their utility in mathematics research. A mitigation is using LLMs to generate formal proofs in languages like Lean. We perform the first large-scale evaluation of this method's ability to solve open problems. Our most capable agent autonomously resolved 9 of 353 open Erd\H{o}s problems at the per-problem cost of a few hundred dollars, proved 44/492 OEIS conjectures, and is being deployed in combinatorics, optimization, graph theory, algebraic geometry, and quantum optics research. A basic agent alternating LLM-based generation with Lean-based verification replicated the Erd\H{o}s successes but proved costlier on the hardest problems. These findings demonstrate the power of AI-aided formal proof search and shed light on the agent designs that enable it.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.