miniCodeProps: a Minimal Benchmark for Proving Code Properties

Evan Lohn; Sean Welleck

arXiv:2406.11915·cs.SE·October 11, 2024·2 cites

miniCodeProps: a Minimal Benchmark for Proving Code Properties

Evan Lohn, Sean Welleck

PDF

Open Access

TL;DR

miniCodeProps is a new benchmark with 201 program specifications in Lean, designed to evaluate the ability of AI-based theorem provers to verify simple programs, revealing current limitations of neural provers on more complex proofs.

Contribution

The paper introduces miniCodeProps, a benchmark dataset for testing automated theorem proving on verified code, highlighting current neural provers' challenges with medium and hard properties.

Findings

01

State-of-the-art provers succeed on easy properties

02

Most medium and hard properties remain unproven by current methods

03

miniCodeProps is publicly available for future research

Abstract

AI agents have shown initial promise in automating mathematical theorem proving in proof assistants such as Lean. The same proof assistants can be used to verify the correctness of code by pairing code with specifications and proofs that the specifications hold. Automating the writing of code, specifications, and proofs could lower the cost of verification, or, ambitiously, enable an AI agent to output safe, provably correct code. However, it remains unclear whether current neural theorem provers can automatically verify even relatively simple programs. We present miniCodeProps, a benchmark of 201 program specifications in the Lean proof assistant, aimed at the subproblem of automatically generating a proof for a provided program and specification. miniCodeProps contains specifications about simple, self-contained programs (e.g., lists, natural numbers, binary trees) with varied proof…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFormal Methods in Verification · Software Testing and Debugging Techniques · VLSI and Analog Circuit Testing