MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Shaden Alshammari; Kevin Wen; Abrar Zainal; Mark Hamilton; Navid Safaei; Sultan Albarakati; William T. Freeman; Antonio Torralba

arXiv:2604.18584·cs.AI·April 21, 2026

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba

PDF

2 Repos 9 Datasets 1 Video

TL;DR

MathNet is a comprehensive, multilingual dataset and benchmark for mathematical reasoning and retrieval, covering diverse Olympiad problems and challenging state-of-the-art models across multiple tasks.

Contribution

It introduces the largest high-quality Olympiad-level math dataset and the first benchmark for mathematical problem retrieval, supporting multiple tasks and multilingual data.

Findings

01

State-of-the-art models still struggle with problem solving accuracy.

02

Embedding models have difficulty retrieving equivalent problems.

03

Retrieval-augmented generation benefits significantly from high-quality retrieval.

Abstract

Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing benchmarks are limited in size, language coverage, and task diversity. We introduce MathNet, a high-quality, large-scale, multimodal, and multilingual dataset of Olympiad-level math problems together with a benchmark for evaluating mathematical reasoning in generative models and mathematical retrieval in embedding-based systems. MathNet spans 47 countries, 17 languages, and two decades of competitions, comprising 30,676 expert-authored problems with solutions across diverse domains. In addition to the core dataset, we construct a retrieval benchmark consisting of mathematically equivalent and structurally similar problem pairs curated by human experts. MathNet supports three tasks: (i) Problem Solving, (ii) Math-Aware Retrieval, and (iii) Retrieval-Augmented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval· slideslive