Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning

Xingguang Ji; Yahui Liu; Qi Wang; Jingyuan Zhang; Yang Yue; Rui Shi; Chenxi Sun; Fuzheng Zhang; Guorui Zhou; Kun Gai

arXiv:2507.08649·cs.AI·July 14, 2025

Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning

Xingguang Ji, Yahui Liu, Qi Wang, Jingyuan Zhang, Yang Yue, Rui Shi, Chenxi Sun, Fuzheng Zhang, Guorui Zhou, Kun Gai

PDF

TL;DR

Leanabell-Prover-V2 is a reinforcement learning-enhanced language model that integrates verifier feedback to improve formal theorem proving in Lean 4, demonstrating notable performance gains on the MiniF2F benchmark.

Contribution

It introduces a verifier-integrated RL training method for LLMs to improve formal proof generation, building upon and enhancing previous prover models.

Findings

01

Improves pass@128 performance by 3.2% with Kimina-Prover-Preview-Distill-7B.

02

Achieves a 2.0% increase with DeepSeek-Prover-V2-7B.

03

Demonstrates the effectiveness of verifier feedback in self-aware reasoning correction.

Abstract

We introduce our Leanabell-Prover-V2, a 7B large language models (LLMs) that can produce formal theorem proofs in Lean 4, with verifier-integrated Long Chain-of-Thoughts (CoT). Following our previous work Leanabell-Prover-V1, we continual to choose to posttrain existing strong prover models for further performance improvement. In our V2 version, we mainly upgrade the Reinforcement Learning (RL) with feedback provided by the Lean 4 verifier. Crucially, verifier feedback, such as indicating success or detailing specific errors, allows the LLM to become ``self-aware'' of the correctness of its own reasoning process and learn to reflexively correct errors. Leanabell-Prover-V2 directly optimizes LLM reasoning trajectories with multi-turn verifier interactions, together with feedback token masking for stable RL training and a simple reward strategy. Experiments show that Leanabell-Prover-V2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.