MathAtlas: A Benchmark for Autoformalization in the Wild

Nilay Patel; Noah Arias; Davit Babayan; Victoria Cochran; Timothy Libman; Hafsah Mahmood; Liam McCarty; Soli Munoz; Laurel Willey; Jeffrey Flanigan

arXiv:2605.14061·cs.AI·May 15, 2026

MathAtlas: A Benchmark for Autoformalization in the Wild

Nilay Patel, Noah Arias, Davit Babayan, Victoria Cochran, Timothy Libman, Hafsah Mahmood, Liam McCarty, Soli Munoz, Laurel Willey, Jeffrey Flanigan

PDF

TL;DR

MathAtlas is a large-scale, challenging benchmark dataset for autoformalizing graduate-level mathematics, including a rich dependency graph, to advance research in this complex domain.

Contribution

It introduces MathAtlas, the first extensive autoformalization benchmark for graduate mathematics, with a dependency graph and challenging evaluation results.

Findings

01

Strong baselines achieve at most 9.8% correctness on theorem statements.

02

Performance degrades with dependency depth, with only 2.6% correctness on deep dependency trees.

03

MathAtlas is high quality but extremely challenging for current models.

Abstract

Current autoformalization benchmarks are largely focused on olympiad or undergraduate mathematics, while graduate and research-level mathematics remains underexplored. In this paper, we introduce MathAtlas, the first large-scale autoformalization benchmark of in the wild graduate-level mathematics, containing ~52k theorems, definitions, exercises, examples, and proofs extracted from 103 graduate mathematics textbooks. MathAtlas is enriched with a mathematical dependency graph containing ~178k relations, and is the first autoformalization benchmark to include such relations, facilitating evaluation and development of dependency-aware autoformalization systems. Our extensive experiments show that MathAtlas is high quality but extremely challenging: strong baselines achieve at most 9.8% correctness on theorem statements and 16.7% on definitions. Furthermore, we find performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.