CSLibPremiseBench: Structure-Guided Premise Retrieval and Label Robustness for Lean 4 Computer-Science Theorems

Junye Ji

arXiv:2605.14549·cs.LO·May 15, 2026

CSLibPremiseBench: Structure-Guided Premise Retrieval and Label Robustness for Lean 4 Computer-Science Theorems

Junye Ji

PDF

TL;DR

This paper introduces CSLibPremiseBench, a benchmark for premise retrieval in Lean 4's CSLib library, evaluating various heuristics and a structure-guided reranker, highlighting the influence of repository structure and proxy labels.

Contribution

It provides a reproducible benchmark and empirical analysis of premise retrieval methods specific to CSLib, emphasizing the impact of repository design and label robustness.

Findings

01

CSG-Rerank offers modest early-rank MRR improvements over BM25.

02

Proxy labels require explicit caveats due to their limitations.

03

Repository structure significantly influences premise retrieval performance.

Abstract

CSLib is an emerging Lean 4 library for computer-science formalization, but its premise-retrieval behavior is not well represented by broad mathematical theorem-proving benchmarks. We introduce CSLibPremiseBench, a reproducible CSLib-specific benchmark and empirical study for source-visible premise retrieval over Lean 4 theorem and lemma declarations. The benchmark pins CSLib v4.29.0 at commit 0d37cc7fcc985cfc53b155e7eef2453f846c6da2, builds with Lean 4.29.0, and evaluates a strict import/source-order task set with 801 proxy-labelable tasks and 1875 CSLib candidate declarations. The labels are source-visible CSLib proof-reference proxies, not elaborated Lean dependency traces. We audit label robustness using stricter source-visible matching and a 300-task Lean environment expression probe, then compare BM25, symbol/name overlap, namespace/module and import-graph heuristics,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.