Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning
Xingguang Ji, Yahui Liu, Qi Wang, Jingyuan Zhang, Yang Yue, Rui Shi, Chenxi Sun, Fuzheng Zhang, Guorui Zhou, Kun Gai

TL;DR
Leanabell-Prover-V2 is a reinforcement learning-enhanced language model that integrates verifier feedback to improve formal theorem proving in Lean 4, demonstrating notable performance gains on the MiniF2F benchmark.
Contribution
It introduces a verifier-integrated RL training method for LLMs to improve formal proof generation, building upon and enhancing previous prover models.
Findings
Improves pass@128 performance by 3.2% with Kimina-Prover-Preview-Distill-7B.
Achieves a 2.0% increase with DeepSeek-Prover-V2-7B.
Demonstrates the effectiveness of verifier feedback in self-aware reasoning correction.
Abstract
We introduce our Leanabell-Prover-V2, a 7B large language models (LLMs) that can produce formal theorem proofs in Lean 4, with verifier-integrated Long Chain-of-Thoughts (CoT). Following our previous work Leanabell-Prover-V1, we continual to choose to posttrain existing strong prover models for further performance improvement. In our V2 version, we mainly upgrade the Reinforcement Learning (RL) with feedback provided by the Lean 4 verifier. Crucially, verifier feedback, such as indicating success or detailing specific errors, allows the LLM to become ``self-aware'' of the correctness of its own reasoning process and learn to reflexively correct errors. Leanabell-Prover-V2 directly optimizes LLM reasoning trajectories with multi-turn verifier interactions, together with feedback token masking for stable RL training and a simple reward strategy. Experiments show that Leanabell-Prover-V2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
