LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving
Guoxiong Gao, Zeming Sun, Jiedong Jiang, Yutong Wang, Jingda Xu, Peihao Wu, Bryan Dai, Bin Dong

TL;DR
LeanSearch v2 is a dual-mode retrieval system that significantly improves premise retrieval for Lean 4 theorem proving, enabling more effective proof automation and outperforming existing tools.
Contribution
It introduces a novel two-mode retrieval system with state-of-the-art single-query and global premise retrieval capabilities, open-sourced for the first time.
Findings
State-of-the-art single-query retrieval (nDCG@10 of 0.62)
Global premise retrieval recovers 46.1% of ground-truth premise groups
Improves proof success rate to 20% in downstream evaluation.
Abstract
Proving theorems in Lean 4 often requires identifying a scattered set of library lemmas whose joint use enables a concise proof -- a task we call global premise retrieval. Existing tools address adjacent problems: semantic search engines find individual declarations matching a query, while premise-selection systems predict useful lemmas one tactic step at a time. Neither recovers the full premise set an entire theorem requires. We present LeanSearch v2, a two-mode retrieval system for this task. Its standard mode applies a hierarchy-informalized Mathlib corpus with an embedding-reranker pipeline, achieving state-of-the-art single-query retrieval without domain-specific fine-tuning (nDCG@10 of 0.62 vs. 0.53 for the next-best system). Its reasoning mode builds on standard mode as its retrieval substrate, targeting global premise retrieval through iterative sketch-retrieve-reflect cycles.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
