TL;DR
This paper introduces a novel learning-to-refine framework that enhances formal theorem provers by exploiting compiler-structured proof failure modes, significantly improving reasoning efficiency and performance.
Contribution
It presents a scalable, proof-guided learning method that leverages compiler-based compression to boost theorem prover capabilities without extensive test-time computation.
Findings
Achieves state-of-the-art performance on PutnamBench with large models.
Effectively amplifies reasoning abilities across different model scales.
Reduces proof exploration costs by local error correction.
Abstract
Large language models (LLMs) have demonstrated significant potential in formal theorem proving, yet state-of-the-art performance often necessitates prohibitive test-time compute via massive roll-outs or extended context windows. In this work, we address this scalability bottleneck by exploiting an informative structure in formal verification: the observation that compilers map a vast space of diverse proof attempts to a compact set of structured failure modes. We introduce a learning-to-refine framework that leverages this compression to perform efficient learning and proof exploration. We perform tree search that corrects errors locally conditioned on explicit verifier feedback, thereby circumventing the costs associated with accumulating a long history of proof attempts. Extensive evaluations show that our method consistently amplifies the reasoning capabilities of base provers across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
