ImProver: Agent-Based Automated Proof Optimization

Riyaz Ahuja; Jeremy Avigad; Prasad Tetali; Sean Welleck

arXiv:2410.04753·cs.AI·May 22, 2026

ImProver: Agent-Based Automated Proof Optimization

Riyaz Ahuja, Jeremy Avigad, Prasad Tetali, Sean Welleck

PDF

2 Repos 1 Video 3 Reviews

TL;DR

ImProver is an innovative agent leveraging large language models to automatically optimize formal proofs in Lean, enhancing their length, readability, and modularity according to user-defined criteria.

Contribution

The paper introduces ImProver, a novel LLM-based proof rewriting system that incorporates symbolic context, error correction, and retrieval to optimize proofs for various criteria.

Findings

01

ImProver can produce shorter, more readable proofs.

02

The system effectively rewrites real-world mathematical proofs.

03

ImProver outperforms naive LLM applications in proof optimization.

Abstract

Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, depending on its downstream use. For example, we may want a proof to adhere to a certain style, or to be readable, concise, or modularly structured. Having suitably optimized proofs is also important for learning tasks, especially since human-written proofs may not optimal for that purpose. To this end, we study a new problem of automated proof optimization: rewriting a proof so that it is correct and optimizes for an arbitrary criterion, such as length or readability. As a first method for automated proof optimization, we present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean. We find that naively applying LLMs…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

This paper first discovers a new field of automated proof optimization, and conducts a pioneering attempt in this field, marking the first significant effort to use LLMs for enhancing formal proofs with respect to certain metrics in proof assistants like Lean. A notable strength of this work is its emphasis on the role of proof states in guiding LLMs. By introducing the Chain-of-States technique, which provides intermediate proof states to the LLM, the authors demonstrate an understanding of how

Weaknesses

- The method proposed in this paper is well-suited for metrics that can be directly computed from the formal proof. However, this limits its applicability to more complex metrics such as readability. The paper defines readability as the ratio of explicitly typed have tactics to the total number of tactic invocations in line 130. This simple definition may lead to a proof with numerous unused have tactics being considered more readable than a normal proof. The reviewer considers readability to be

Reviewer 02Rating 5Confidence 3

Strengths

+ The paper tackles the proof optimization, which has not been considered yet in neural theorem proving, and proposes a (simple) approach to it. + The proposed approach can outperform the baseline, which simply queries to an LLM (GPT-4o).

Weaknesses

- Because the proposed approach to the proof optimization (described in Section 3) is simple, I find the main contribution of the paper is in applying LLMs to proof optimization, a problem in theorem proving. When evaluating the paper from such a viewpoint, I admit the proof optimization problem may be worth studying, but I don't think the paper argues its value in a convincing manner. The second paragraph of Section 1 discusses the importance of the proof optimization, but there is no reference

Reviewer 03Rating 8Confidence 4

Strengths

The authors frame an interesting and potentially very impactful problem, propose a smart solution, and execute on that solution well. The paper is well-written and the results appear convincing to me.

Weaknesses

The authors could mention interactive proofs / prover-verifier games (https://arxiv.org/abs/2108.12099), where LLMs are trained to write informal math proofs to be maximally legible to an independent verifier, as related work. I believe the caption of Figure 1 should read (left) instead of (right)? For reproducibility, the authors should mention the exact version of GPT-4o they used for their experiments (gpt-4o-2024-08-06 ?) I’m taking it on trust that “readability” defined as in the paper i

Code & Models

Repositories

Videos

ImProver: Agent-Based Automated Proof Optimization· slideslive

Taxonomy

TopicsModeling, Simulation, and Optimization · Scheduling and Optimization Algorithms · Multi-Agent Systems and Negotiation