STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
Kefan Dong, Tengyu Ma

TL;DR
STP introduces a self-play framework for LLM-based theorem proving, where conjecturing and proving mutually improve, significantly enhancing proof success rates and achieving state-of-the-art results on multiple benchmarks.
Contribution
The paper presents a novel self-play approach for LLM theorem provers, enabling iterative conjecture generation and proof, leading to substantial performance improvements over prior methods.
Findings
Proves 28.5% of statements in LeanWorkbook, doubling previous best.
Achieves state-of-the-art on miniF2F-test, Proofnet-test, and PutnamBench.
Generated 51.3 billion tokens during training.
Abstract
A fundamental challenge in formal theorem proving by LLMs is the lack of high-quality training data. Although reinforcement learning or expert iteration partially mitigates this issue by alternating between LLM generating proofs and finetuning them on correctly generated ones, performance quickly plateaus due to the scarcity of correct proofs (sparse rewards). To keep improving the models with limited data, we draw inspiration from mathematicians, who continuously develop new results, partly by proposing novel conjectures or exercises (which are often variants of known results) and attempting to solve them. We design the Self-play Theorem Prover (STP) that simultaneously takes on two roles, conjecturer and prover, each providing training signals to the other. The conjecturer is trained iteratively on previously generated conjectures that are barely provable by the current prover, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAuction Theory and Applications
