Prover Agent: An Agent-Based Framework for Formal Mathematical Proofs
Kaito Baba, Chaoran Liu, Shuhei Kurita, Akiyoshi Sannai

TL;DR
Prover Agent is an innovative AI framework combining large language models with formal proof assistants to enhance automated theorem proving, achieving state-of-the-art results on benchmark datasets.
Contribution
It introduces a novel agent-based system that integrates informal reasoning, formal proof, and auxiliary lemma generation to improve theorem proving success.
Findings
Achieves 88.1% success on MiniF2F
Solves 25 problems on PutnamBench with fewer resources
Establishes new state-of-the-art among small language model methods
Abstract
We present Prover Agent, a novel AI agent for automated theorem proving that integrates large language models (LLMs) with a formal proof assistant, Lean. Prover Agent coordinates an informal reasoning LLM, a formal prover model, and feedback from Lean while also generating auxiliary lemmas. These auxiliary lemmas are not limited to subgoals in the formal proof but can also include special cases or potentially useful facts derived from the assumptions, which help in discovering a viable proof strategy. It achieves an 88.1% success rate on MiniF2F and solves 25 problems on the PutnamBench with a smaller sample budget than previous approaches, establishing a new state-of-the-art on both benchmarks among methods using small language models (SLMs). We also present theoretical analyses and case studies that illustrate how these generated lemmas contribute to solving challenging problems. Our…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper has very strong results, getting SOTA on MiniF2F among small language models. 2. The paper is very well-written.
1. It would be nice to compare more in-depth to other works that also generate auxiliary lemmas in formal theorem proving? 1. For example, Seed Prover also performs a somewhat open-ended exploration of potentially useful lemmas. 2. In particular the “heavy” inference setting they use has some breadth in terms of the conjectures it generates.
The paper proposes a well-motivated pipeline which uses auxiliary lemmas in a flexible and interesting way. Prior work use auxiliary lemmas primarily for subgoal decomposition (e.g., POETRY), whereas Prover Agent can use lemmas to explore special or related cases to the current theorem. While those might not be directly invoked in the final proof, they can help the agent to get signal on what proof strategies seem to work for the given problem. They can also, of course, end up serving as useful
While the empirical result is generally positive, it is (1) currently limited to minif2f, and (2) small in absolute terms. I think minif2f served well as a benchmark for several years, but the baseline results at all scales have improved significantly, so it might not represent the biggest outstanding challenges in formal theorem proving anymore. For instance, Goedel Prover V2 already achieves 85.2% of success rate in Pass@256. With Prover Agent, this improves to 86.5% (or 88.1% with ensembling
- Addresses an important problem in automated theorem proving by bridging informal and formal reasoning - Achieves strong empirical results (88.1% on MiniF2F) with reportedly lower sample complexity than prior work - The approach of generating auxiliary lemmas to decompose complex proofs is intuitive and potentially valuable
- Theoretical analysis lacks practical relevance (Section 4): The theoretical framework appears disconnected from the empirical contributions. The assumptions made seem uncheckable and are introduced primarily to enable mathematical tractability rather than to provide meaningful insights. The analysis does not quantify the effectiveness of the approach in a way that connects to the experimental results. Unless this analysis can be meaningfully tied to the empirical performance, it should be remo
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Logic, programming, and type systems · Open Education and E-Learning
