Agentic Proof Automation: A Case Study

Yichen Xu; Martin Odersky

arXiv:2601.03768·cs.PL·January 8, 2026

Agentic Proof Automation: A Case Study

Yichen Xu, Martin Odersky

PDF

Open Access

TL;DR

This paper demonstrates how large language model-based agents can automate significant portions of proof engineering in formal systems, reducing human effort while highlighting current limitations in creative reasoning.

Contribution

It introduces the concept of agentic proof automation and provides a case study showing agents can successfully handle most proof tasks with minimal human input.

Findings

01

Agents completed 189 proof tasks with 87% success rate

02

Only 16% of tasks required human intervention

03

Agents significantly boost proof engineering productivity

Abstract

Proof engineering is notoriously labor-intensive: proofs that are straightforward on paper often require lengthy scripts in theorem provers. Recent advances in large language models (LLMs) create new opportunities for proof automation: modern LLMs not only generate proof scripts, but also support agentic behavior, exploring codebases and iteratively refining their outputs against prover feedback. These advances enable an emerging scheme where LLM-based agents undertake most proof engineering under human guidance. Humans provide mathematical insight (definitions, theorems, proof strategies); agents handle the mechanical work of proof development. We call this scheme agentic proof automation. We present this scheme through a case study: mechanizing the semantic type soundness of a sophisticated formal system, System Capless, in Lean 4, comprising over 14,000 lines of code. Using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, programming, and type systems · Multi-Agent Systems and Negotiation · Formal Methods in Verification