VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean
Yutong Xin, Qiaochu Chen, Greg Durrett, I\c{s}il Dillig

TL;DR
VeriSoftBench is a new benchmark of 500 Lean 4 proof obligations from software verification, highlighting challenges in proof automation transferability and the importance of repository context in proof success.
Contribution
Introduces VeriSoftBench, a repository-scale benchmark for Lean proofs in software verification, with insights on proof automation transferability and dependency context importance.
Findings
Provers tuned for Mathlib-style mathematics transfer poorly to repository-centric proofs.
Proof success correlates with the size of dependency closures.
Providing curated dependency context improves proof automation performance.
Abstract
Large language models have achieved striking results in interactive theorem proving, particularly in Lean. However, most benchmarks for LLM-based proof automation are drawn from mathematics in the Mathlib ecosystem, whereas proofs in software verification are developed inside definition-rich codebases with substantial project-specific libraries. We introduce VeriSoftBench, a benchmark of 500 Lean 4 proof obligations drawn from open-source formal-methods developments and packaged to preserve realistic repository context and cross-file dependencies. Our evaluation of frontier LLMs and specialized provers yields three observations. First, provers tuned for Mathlib-style mathematics transfer poorly to this repository-centric setting. Second, success is strongly correlated with transitive repository dependence: tasks whose proofs draw on large, multi-hop dependency closures are less likely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, programming, and type systems · Mathematics, Computing, and Information Processing · Model-Driven Software Engineering Techniques
