MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba

TL;DR
MathNet is a comprehensive, multilingual dataset and benchmark for mathematical reasoning and retrieval, covering diverse Olympiad problems and challenging state-of-the-art models across multiple tasks.
Contribution
It introduces the largest high-quality Olympiad-level math dataset and the first benchmark for mathematical problem retrieval, supporting multiple tasks and multilingual data.
Findings
State-of-the-art models still struggle with problem solving accuracy.
Embedding models have difficulty retrieving equivalent problems.
Retrieval-augmented generation benefits significantly from high-quality retrieval.
Abstract
Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems together with a benchmark for evaluating mathematical reasoning in generative models and mathematical retrieval in embedding-based systems. MathNet spans 47 countries, 17 languages, and two decades of competitions, comprising 30,676 expert-authored problems with solutions across diverse domains. In addition to the core dataset, we construct a retrieval benchmark consisting of mathematically equivalent and structurally similar problem pairs curated by human experts. MathNet supports three tasks: (i) Problem Solving, (ii) Math-Aware Retrieval, and (iii) Retrieval-Augmented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- ShadenA/MathNetdataset· 18k dl18k dl
- musumecmtcd/MathNetdataset· 1.1k dl1.1k dl
- anuragsbaghel/MathNetdataset· 322 dl322 dl
- introvoyz041/MathNetdataset· 1.1k dl1.1k dl
- knowurknottty/MathNetdataset· 5.1k dl5.1k dl
- Bacon935/MathNetdataset· 1.3k dl1.3k dl
- archya/MathNetdataset· 3.5k dl3.5k dl
- weylmann/MathNetdataset· 3.7k dl3.7k dl
Videos
