Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges

Husnain Amjad; Raja Khurram Shahzad; Aamir Shahzad; Mehwish Fatima

arXiv:2605.19723·cs.CL·May 20, 2026

Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges

Husnain Amjad, Raja Khurram Shahzad, Aamir Shahzad, Mehwish Fatima

PDF

TL;DR

This survey reviews recent progress in mathematical reasoning with large language models, analyzing datasets, architectures, evaluation methods, and open challenges to guide future research.

Contribution

It introduces a unified taxonomy of datasets, systematically analyzes reasoning architectures and training strategies, and highlights key challenges and future directions.

Findings

01

Unified taxonomy of mathematical datasets

02

Analysis of reasoning architectures and training strategies

03

Identification of recurring failure modes and research gaps

Abstract

Mathematical reasoning is essential for problem-solving in education, science, and industry, serving as a crucial benchmark for evaluating artificial intelligence systems. As Large Language Models (LLMs) improve their reasoning capabilities, understanding how well they perform mathematical reasoning has become increasingly important. This survey synthesizes recent advancements in mathematical reasoning with LLMs through a structured analysis of datasets, architectures, training strategies, and evaluation protocols. Our systematic review encompasses approximately 120 peer-reviewed studies and preprints, examining the evolution of this research area and providing a unified analytical framework to understand current progress and limitations. Our study particularly introduces a unified taxonomy of mathematical datasets, distinguishing between pretraining corpora, supervised fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.