APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
Azim Ospanov, Farzan Farnia, Roozbeh Yousefzadeh

TL;DR
APOLLO is a modular framework that combines Lean's formal verification with LLM reasoning to significantly improve automated theorem proving accuracy and efficiency, especially with small models and limited sampling budgets.
Contribution
This work introduces APOLLO, a novel, model-agnostic system that automates proof repair and verification, achieving state-of-the-art results with low resource consumption.
Findings
Achieves 84.9% accuracy on miniF2F benchmark with sub-8B models.
Raises Goedel-Prover-SFT accuracy to 65.6%.
Improves general-purpose model accuracy from 3-7% to over 40%.
Abstract
Formal reasoning and automated theorem proving constitute a challenging subfield of machine learning, in which machines are tasked with proving mathematical theorems using formal languages like Lean. A formal verification system can check whether a formal proof is correct or not almost instantaneously, but generating a completely correct formal proof with large language models (LLMs) remains a formidable task. The usual approach in the literature is to prompt the LLM many times (up to several thousands) until one of the generated proofs passes the verification system. In this work, we present APOLLO (Automated PrOof repair viaLLM and Lean cOllaboration), a modular, model-agnostic agentic framework that combines the strengths of the Lean compiler with an LLM's reasoning abilities to achieve better proof-generation results at a low token and sampling budgets. Apollo directs a fully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLogic, programming, and type systems · Mathematics, Computing, and Information Processing · Machine Learning in Materials Science
MethodsAdaptive Parameter-wise Diagonal Quasi-Newton Method · Sparse Evolutionary Training
