TL;DR
ImProver is an innovative agent leveraging large language models to automatically optimize formal proofs in Lean, enhancing their length, readability, and modularity according to user-defined criteria.
Contribution
The paper introduces ImProver, a novel LLM-based proof rewriting system that incorporates symbolic context, error correction, and retrieval to optimize proofs for various criteria.
Findings
ImProver can produce shorter, more readable proofs.
The system effectively rewrites real-world mathematical proofs.
ImProver outperforms naive LLM applications in proof optimization.
Abstract
Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, depending on its downstream use. For example, we may want a proof to adhere to a certain style, or to be readable, concise, or modularly structured. Having suitably optimized proofs is also important for learning tasks, especially since human-written proofs may not optimal for that purpose. To this end, we study a new problem of automated proof optimization: rewriting a proof so that it is correct and optimizes for an arbitrary criterion, such as length or readability. As a first method for automated proof optimization, we present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean. We find that naively applying LLMs…
Peer Reviews
Decision·ICLR 2025 Poster
This paper first discovers a new field of automated proof optimization, and conducts a pioneering attempt in this field, marking the first significant effort to use LLMs for enhancing formal proofs with respect to certain metrics in proof assistants like Lean. A notable strength of this work is its emphasis on the role of proof states in guiding LLMs. By introducing the Chain-of-States technique, which provides intermediate proof states to the LLM, the authors demonstrate an understanding of how
- The method proposed in this paper is well-suited for metrics that can be directly computed from the formal proof. However, this limits its applicability to more complex metrics such as readability. The paper defines readability as the ratio of explicitly typed have tactics to the total number of tactic invocations in line 130. This simple definition may lead to a proof with numerous unused have tactics being considered more readable than a normal proof. The reviewer considers readability to be
+ The paper tackles the proof optimization, which has not been considered yet in neural theorem proving, and proposes a (simple) approach to it. + The proposed approach can outperform the baseline, which simply queries to an LLM (GPT-4o).
- Because the proposed approach to the proof optimization (described in Section 3) is simple, I find the main contribution of the paper is in applying LLMs to proof optimization, a problem in theorem proving. When evaluating the paper from such a viewpoint, I admit the proof optimization problem may be worth studying, but I don't think the paper argues its value in a convincing manner. The second paragraph of Section 1 discusses the importance of the proof optimization, but there is no reference
The authors frame an interesting and potentially very impactful problem, propose a smart solution, and execute on that solution well. The paper is well-written and the results appear convincing to me.
The authors could mention interactive proofs / prover-verifier games (https://arxiv.org/abs/2108.12099), where LLMs are trained to write informal math proofs to be maximally legible to an independent verifier, as related work. I believe the caption of Figure 1 should read (left) instead of (right)? For reproducibility, the authors should mention the exact version of GPT-4o they used for their experiments (gpt-4o-2024-08-06 ?) I’m taking it on trust that “readability” defined as in the paper i
Code & Models
Videos
Taxonomy
TopicsModeling, Simulation, and Optimization · Scheduling and Optimization Algorithms · Multi-Agent Systems and Negotiation
