Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics
Moritz Firsching, Paul Lezeau, Salvatore Mercuri, Mikl\'os Z. Horv\'ath, Ya\"el Dillies, Calle S\"onne, Eric Wieser, Fred Zhang, Thomas Hubert, Blaise Ag\"uera y Arcas, Pushmeet Kohli

TL;DR
The paper introduces Formal Conjectures, a dynamic benchmark of formalized mathematical problems in Lean 4, aimed at evaluating and advancing automated reasoning systems in research-level mathematics.
Contribution
It presents a large, evolving dataset of formalized conjectures and solved problems, with a structured interface connecting mathematicians and AI systems, and provides baseline evaluation results.
Findings
Benchmark includes 2615 problems with 1029 open conjectures.
Demonstrated utility by solving open research conjectures.
Provides a standardized evaluation setup with baseline results.
Abstract
As automated reasoning systems advance rapidly, there is a growing need for research-level formal mathematical problems to accurately evaluate their capabilities. To address this, we present Formal Conjectures, an evolving benchmark of currently 2615 mathematical problem statements formalized in Lean 4. Sourced from areas of active mathematical research, the dataset features 1029 open research conjectures providing a zero-contamination benchmark for mathematical proof discovery, and 836 solved problems for proof autoformalization. Notably, the repository provides a structured interface connecting mathematicians who formalize and clarify problems with the AI systems and humans attempting to solve them. Demonstrating its immediate utility, the benchmark has already been leveraged to make new mathematical discoveries, including the resolution of open research conjectures. We describe our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
