APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Azim Ospanov; Farzan Farnia; Roozbeh Yousefzadeh

arXiv:2505.05758·cs.AI·November 5, 2025

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

Azim Ospanov, Farzan Farnia, Roozbeh Yousefzadeh

PDF

Open Access 1 Video

TL;DR

APOLLO is a modular framework that combines Lean's formal verification with LLM reasoning to significantly improve automated theorem proving accuracy and efficiency, especially with small models and limited sampling budgets.

Contribution

This work introduces APOLLO, a novel, model-agnostic system that automates proof repair and verification, achieving state-of-the-art results with low resource consumption.

Findings

01

Achieves 84.9% accuracy on miniF2F benchmark with sub-8B models.

02

Raises Goedel-Prover-SFT accuracy to 65.6%.

03

Improves general-purpose model accuracy from 3-7% to over 40%.

Abstract

Formal reasoning and automated theorem proving constitute a challenging subfield of machine learning, in which machines are tasked with proving mathematical theorems using formal languages like Lean. A formal verification system can check whether a formal proof is correct or not almost instantaneously, but generating a completely correct formal proof with large language models (LLMs) remains a formidable task. The usual approach in the literature is to prompt the LLM many times (up to several thousands) until one of the generated proofs passes the verification system. In this work, we present APOLLO (Automated PrOof repair viaLLM and Lean cOllaboration), a modular, model-agnostic agentic framework that combines the strengths of the Lean compiler with an LLM's reasoning abilities to achieve better proof-generation results at a low token and sampling budgets. Apollo directs a fully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning· slideslive

Taxonomy

TopicsLogic, programming, and type systems · Mathematics, Computing, and Information Processing · Machine Learning in Materials Science

MethodsAdaptive Parameter-wise Diagonal Quasi-Newton Method · Sparse Evolutionary Training