LeanExplore: A search engine for Lean 4 declarations
Justin Asher (Independent Researcher)

TL;DR
LeanExplore is a semantic search engine for Lean 4 declarations that combines formal and informal statement retrieval with ranking strategies, enhancing navigation and AI integration in Lean 4's extensive libraries.
Contribution
This paper introduces LeanExplore, a novel search engine that combines semantic embeddings, lexical relevance, and declaration importance to improve navigation in Lean 4 libraries.
Findings
Effective semantic search across multiple Lean 4 packages.
Integration with LLMs for AI-assisted theorem proving.
Open-source database for self-hosted deployment.
Abstract
The expanding Lean 4 ecosystem poses challenges for navigating its vast libraries. This paper introduces LeanExplore, a search engine for Lean 4 declarations. LeanExplore enables users to semantically search for statements, both formally and informally, across select Lean 4 packages (including Batteries, Init, Lean, Mathlib, PhysLean, and Std). This search capability is powered by a hybrid ranking strategy, integrating scores from a multi-source semantic embedding model (capturing conceptual meaning from formal Lean code, docstrings, AI-generated informal translations, and declaration titles), BM25+ for keyword-based lexical relevance, and a PageRank-based score reflecting declaration importance and interconnectedness. The search engine is accessible via a dedicated website (https://www.leanexplore.com/) and a Python API (https://github.com/justincasher/lean-explore). Furthermore, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence · Quality and Supply Management · Collaboration in agile enterprises
