RLMEval: Evaluating Research-Level Neural Theorem Proving

Auguste Poiroux; Antoine Bosselut; Viktor Kun\v{c}ak

arXiv:2510.25427·cs.CL·October 30, 2025

RLMEval: Evaluating Research-Level Neural Theorem Proving

Auguste Poiroux, Antoine Bosselut, Viktor Kun\v{c}ak

PDF

2 Videos

TL;DR

RLMEval is a new evaluation suite designed to assess the performance of neural theorem proving models on complex, research-level mathematics, revealing a significant gap between benchmark results and real-world applicability.

Contribution

The paper introduces RLMEval, a challenging benchmark based on real-world Lean formalizations, to better evaluate neural theorem proving in research-level mathematics.

Findings

01

State-of-the-art models achieve only 10.3% pass rate on RLMEval.

02

Existing benchmarks do not reflect the difficulty of research-level theorems.

03

RLMEval highlights the need for improved models for formal mathematics.

Abstract

Despite impressive results on curated benchmarks, the practical impact of large language models (LLMs) on research-level neural theorem proving and proof autoformalization is still limited. We introduce RLMEval, an evaluation suite for these tasks, focusing on research-level mathematics from real-world Lean formalization projects. RLMEval targets the evaluation of neural theorem proving and proof autoformalization on challenging research-level theorems by leveraging real Lean Blueprint formalization projects. Our evaluation of state-of-the-art models on RLMEval, comprising 613 theorems from 6 Lean projects, reveals a significant gap: progress on existing benchmarks does not readily translate to these more realistic settings, with the best model achieving only a 10.3 % pass rate. RLMEval provides a new, challenging benchmark designed to guide and accelerate progress in automated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

RLMEval: Evaluating Research-Level Neural Theorem Proving· underline