StepProof: Step-by-step verification of natural language mathematical proofs
Xiaolin Hu, Qinghua Zhou, Bogdan Grechuk, Ivan Y. Tyukin

TL;DR
StepProof introduces a step-by-step autoformalization approach using large language models, enabling sentence-level verification of natural language mathematical proofs and improving verification success rates.
Contribution
It presents a novel method for granular proof verification that enhances autoformalization capabilities beyond complete proofs.
Findings
Significantly improves proof success rates
Enhances efficiency of proof verification
Manual adjustments further boost performance
Abstract
Interactive theorem provers (ITPs) are powerful tools for the formal verification of mathematical proofs down to the axiom level. However, their lack of a natural language interface remains a significant limitation. Recent advancements in large language models (LLMs) have enhanced the understanding of natural language inputs, paving the way for autoformalization - the process of translating natural language proofs into formal proofs that can be verified. Despite these advancements, existing autoformalization approaches are limited to verifying complete proofs and lack the capability for finer, sentence-level verification. To address this gap, we propose StepProof, a novel autoformalization method designed for granular, step-by-step verification. StepProof breaks down complete proofs into multiple verifiable subproofs, enabling sentence-level verification. Experimental results…
Peer Reviews
Decision·Submitted to ICLR 2025
- Granular Verification Approach: StepProof’s method of breaking down proofs into verifiable sub-propositions is a notable advancement compared to traditional methods that require verifying complete proofs. This could allow for a more targeted approach to error correction by addressing specific subproofs. - Improved Success Rates and Efficiency: The experimental findings indicate that StepProof improves both success rates and efficiency for proof verification over baseline methods. By enabling s
- Unsubstantiated Claim of Advantage: The claimed advantage of StepProof’s selective error correction (where only erroneous steps are retracted rather than the entire proof) is not unique to StepProof. Interactive theorem provers (ITPs) inherently support stepwise correction, enabling users to fix specific errors without requiring a full retraction. Thus, StepProof’s advantage in this aspect appears overstated. - Lack of Novelty in Stepwise Translation: Stepwise translation in autoformalization
+ The paper applies the known idea that, for LLMs, the step-by-step logical reasoning works more well than one-step reasoning, to autoformalization, especially the formal proof generation.
- It is convincing that the step-by-step reasoning works well, but I feel it natural and unsurprising because it has been proven that LLMs can better carry out step-by-step logical reasoning than one-step [1]. Thus, I'm unsure how significant the contribution of the paper, which seems to confirm step-by-step proof generation works more well than one-step generation, is. - Furthermore, it seems that the paper is not the first to apply the step-by-step reasoning power of LLMs to formal proof gener
N.A.
- The paper is almost impossible to comprehend. I highly suspect a big portion of it is generated by LLMs. - The full-proof strategy so baffling: I am not even sure what is formal backend and what is role of users here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, programming, and type systems · Mathematics, Computing, and Information Processing · Formal Methods in Verification
