Combining Textual and Structural Information for Premise Selection in Lean
Job Petrov\v{c}i\v{c}, David Eliecer Narvaez Denis, Ljup\v{c}o Todorovski

TL;DR
This paper introduces a graph-augmented method combining text embeddings and graph neural networks to improve premise selection in Lean theorem proving, significantly outperforming baseline methods by leveraging relational dependency information.
Contribution
The paper proposes a novel approach that integrates textual embeddings with graph neural networks over dependency graphs for premise selection in formal theorem proving.
Findings
Outperforms ReProver baseline by over 25% on LeanDojo Benchmark
Relational dependency information improves premise selection accuracy
Graph-augmented approach effectively leverages premise dependencies
Abstract
Premise selection is a key bottleneck for scaling theorem proving in large formal libraries. Yet existing language-based methods often treat premises in isolation, ignoring the web of dependencies that connects them. We present a graph-augmented approach that combines dense text embeddings of Lean formalizations with graph neural networks over a heterogeneous dependency graph capturing both state-premise and premise-premise relations. On the LeanDojo Benchmark, our method outperforms the ReProver language-based baseline by over 25\% across standard retrieval metrics. These results suggest that relational information is beneficial for premise selection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
