Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions
Lan Zhang, Marco Valentino, Andre Freitas

TL;DR
This paper evaluates how well large language models can autoformalize real-world mathematical definitions from sources like Wikipedia and arXiv, highlighting challenges and proposing strategies to improve their accuracy and reliability.
Contribution
It introduces new datasets for autoformalization, systematically evaluates LLMs on real-world definitions, and explores refinement and grounding strategies to enhance formalization performance.
Findings
Definitions are more challenging than benchmarks like miniF2F.
LLMs struggle with self-correction and library alignment.
Refinement and grounding improve formalization accuracy significantly.
Abstract
Thanks to their linguistic capabilities, LLMs offer an opportunity to bridge the gap between informal mathematics and formal languages through autoformalization. However, it is still unclear how well LLMs generalize to sophisticated and naturally occurring mathematical statements. To address this gap, we investigate the task of autoformalizing real-world mathematical definitions: a critical component of mathematical discourse. Specifically, we introduce two novel resources for autoformalization, collecting definitions from Wikipedia (Def_Wiki) and arXiv papers (Def_ArXiv). We then systematically evaluate a range of LLMs, analyzing their ability to formalize definitions into Isabelle/HOL. Furthermore, we investigate strategies to enhance LLMs' performance including refinement through external feedback from Proof Assistants, and formal definition grounding, where we augment LLMs'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Natural Language Processing Techniques
