FormalRewardBench: A Benchmark for Formal Theorem Proving Reward Models
Zeynel A. Ulu\c{s}an, Burak S. Akbudak, Can S. Erer, G\"ozde G\"ul \c{S}ahin

TL;DR
This paper introduces FormalRewardBench, a new benchmark for evaluating reward models in formal theorem proving, highlighting the performance gap between general LLMs and specialized provers.
Contribution
It presents the first benchmark for reward model evaluation in formal theorem proving, with curated error injection strategies and comprehensive model assessments.
Findings
Frontier LLMs achieve 59.8% accuracy in proof evaluation.
Specialized theorem provers perform worse, at 24.4%.
Proof evaluation ability does not transfer well from theorem proving skills.
Abstract
Recent neural theorem provers use reinforcement learning with verifiable rewards (RLVR), where proof assistants provide binary correctness signals. While verifiable rewards are cheap and scalable without reward hacking issues, they suffer from sparse credit assignment: models receive no learning signal from difficult problems where partial progress goes unrewarded. This motivates learned reward models that can evaluate proof quality beyond binary verification. However, comparing reward models is challenging since it typically requires expensive RL training ablations. To address this, we introduce \textbf{FormalRewardBench}, the first benchmark for evaluating reward models in formal theorem proving with Lean 4. Our benchmark consists of 250 preference pairs where correct proofs are paired with incorrect variants generated through five expert curated error injection strategies: forced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
