miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward

Azim Ospanov; Farzan Farnia; Roozbeh Yousefzadeh

arXiv:2511.03108·cs.AI·November 6, 2025

miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward

Azim Ospanov, Farzan Farnia, Roozbeh Yousefzadeh

PDF

Open Access 2 Datasets 1 Video

TL;DR

This paper revisits the miniF2F benchmark, analyzing the limitations of current AI systems in formal mathematical reasoning, and introduces miniF2F-v2 with improved accuracy and verified statements to better evaluate progress in the field.

Contribution

The paper provides a detailed analysis of miniF2F's limitations, corrects discrepancies to create miniF2F-v2, and demonstrates significant accuracy improvements in theorem proving pipelines.

Findings

01

Best pipeline accuracy is 36% with current models.

02

Corrected discrepancies increased accuracy to 70%.

03

Majority of failures stem from formal-informal statement misalignments.

Abstract

We perform a thorough analysis of the formal and informal statements in the miniF2F benchmark from the perspective of an AI system that is tasked to participate in a math Olympiad consisting of the problems in miniF2F. In such setting, the model has to read and comprehend the problems in natural language, formalize them in Lean language, then proceed with proving the problems, and it will get credit for each problem if the formal proof corresponds to the original informal statement presented to the model. Our evaluation results reveal that the best accuracy of such pipeline can be about 36% using the SoTA models in the literature, considerably lower than the individual SoTA accuracies, 97% and 69% reported in the autoformalization and theorem proving literature. Analyzing the failure modes, we trace back a considerable portion of this drop to discrepancies between the formal and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward· slideslive

Taxonomy

TopicsLogic, programming, and type systems · Natural Language Processing Techniques · Mathematics, Computing, and Information Processing