Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions

Lan Zhang; Marco Valentino; Andre Freitas

arXiv:2502.12065·cs.CL·September 5, 2025

Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions

Lan Zhang, Marco Valentino, Andre Freitas

PDF

Open Access 1 Repo

TL;DR

This paper evaluates how well large language models can autoformalize real-world mathematical definitions from sources like Wikipedia and arXiv, highlighting challenges and proposing strategies to improve their accuracy and reliability.

Contribution

It introduces new datasets for autoformalization, systematically evaluates LLMs on real-world definitions, and explores refinement and grounding strategies to enhance formalization performance.

Findings

01

Definitions are more challenging than benchmarks like miniF2F.

02

LLMs struggle with self-correction and library alignment.

03

Refinement and grounding improve formalization accuracy significantly.

Abstract

Thanks to their linguistic capabilities, LLMs offer an opportunity to bridge the gap between informal mathematics and formal languages through autoformalization. However, it is still unclear how well LLMs generalize to sophisticated and naturally occurring mathematical statements. To address this gap, we investigate the task of autoformalizing real-world mathematical definitions: a critical component of mathematical discourse. Specifically, we introduce two novel resources for autoformalization, collecting definitions from Wikipedia (Def_Wiki) and arXiv papers (Def_ArXiv). We then systematically evaluate a range of LLMs, analyzing their ability to formalize definitions into Isabelle/HOL. Furthermore, we investigate strategies to enhance LLMs' performance including refinement through external feedback from Proof Assistants, and formal definition grounding, where we augment LLMs'…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lanzhang128/definition_autoformalization
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Natural Language Processing Techniques