InternLM2.5-StepProver: Advancing Automated Theorem Proving via Critic-Guided Search
Zijian Wu, Suozhi Huang, Zhejian Zhou, Huaiyuan Ying, Zheng Yuan, Wenwei Zhang, Dahua Lin, Kai Chen

TL;DR
This paper introduces InternLM2.5-StepProver, a novel approach that employs a critic-guided search method to enhance automated theorem proving with large language models, leading to significant performance improvements.
Contribution
It proposes a critic-guided search framework for LLM-based theorem proving and demonstrates substantial performance gains through large-scale expert iteration.
Findings
Proposed critic-guided search improves proof success rate from 59.4% to 65.9%.
Large-scale expert iteration with 20,000 CPU days enhances model performance.
Open-sourced models and proof data for community use.
Abstract
Large Language Models (LLMs) have emerged as powerful tools in mathematical theorem proving, particularly when utilizing formal languages such as LEAN. A prevalent proof method involves the LLM prover iteratively constructing the proof tactic by tactic, typically following a best-first search scheme. However, this method often ignores the critical preference information inside the existing tactic trajectories, hindering the search for deeper proofs. We propose an intuitive yet effective method, which utilizes a critic model to capture the preference information and to guide the search of the prover model at runtime. Given the prover-critic framework, a large-scale expert iteration with more than 20,000 CPU days is then applied to further fine-tune the prover and the critic. The trained InternLM2.5-StepProver critic significantly boosts the performance of the prover model (59.4% to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · AI-based Problem Solving and Planning
