Combining Textual and Structural Information for Premise Selection in Lean

Job Petrov\v{c}i\v{c}; David Eliecer Narvaez Denis; Ljup\v{c}o Todorovski

arXiv:2510.23637·cs.LG·December 2, 2025

Combining Textual and Structural Information for Premise Selection in Lean

Job Petrov\v{c}i\v{c}, David Eliecer Narvaez Denis, Ljup\v{c}o Todorovski

PDF

TL;DR

This paper introduces a graph-augmented method combining text embeddings and graph neural networks to improve premise selection in Lean theorem proving, significantly outperforming baseline methods by leveraging relational dependency information.

Contribution

The paper proposes a novel approach that integrates textual embeddings with graph neural networks over dependency graphs for premise selection in formal theorem proving.

Findings

01

Outperforms ReProver baseline by over 25% on LeanDojo Benchmark

02

Relational dependency information improves premise selection accuracy

03

Graph-augmented approach effectively leverages premise dependencies

Abstract

Premise selection is a key bottleneck for scaling theorem proving in large formal libraries. Yet existing language-based methods often treat premises in isolation, ignoring the web of dependencies that connects them. We present a graph-augmented approach that combines dense text embeddings of Lean formalizations with graph neural networks over a heterogeneous dependency graph capturing both state-premise and premise-premise relations. On the LeanDojo Benchmark, our method outperforms the ReProver language-based baseline by over 25\% across standard retrieval metrics. These results suggest that relational information is beneficial for premise selection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.