The CompMath-MCQ Dataset: Are LLMs Ready for Higher-Level Math?

Bianca Raimondi; Francesco Pivi; Davide Evangelista; Maurizio Gabbrielli

arXiv:2603.03334·cs.CL·March 5, 2026

The CompMath-MCQ Dataset: Are LLMs Ready for Higher-Level Math?

Bianca Raimondi, Francesco Pivi, Davide Evangelista, Maurizio Gabbrielli

PDF

Open Access 1 Datasets

TL;DR

This paper introduces CompMath-MCQ, a new benchmark dataset with graduate-level math questions designed to evaluate the reasoning capabilities of Large Language Models in advanced mathematical topics.

Contribution

The paper presents a novel, carefully curated multiple-choice dataset for assessing LLMs on complex mathematical reasoning beyond elementary problems.

Findings

01

State-of-the-art LLMs struggle with advanced mathematical reasoning.

02

The dataset enables objective and reproducible evaluation.

03

Questions are newly created to prevent data leakage.

Abstract

The evaluation of Large Language Models (LLMs) on mathematical reasoning has largely focused on elementary problems, competition-style questions, or formal theorem proving, leaving graduate-level and computational mathematics relatively underexplored. We introduce CompMath-MCQ, a new benchmark dataset for assessing LLMs on advanced mathematical reasoning in a multiple-choice setting. The dataset consists of 1{,}500 originally authored questions by professors of graduate-level courses, covering topics including Linear Algebra, Numerical Optimization, Vector Calculus, Probability, and Python-based scientific computing. Three option choices are provided for each question, with exactly one of them being correct. To ensure the absence of data leakage, all questions are newly created and not sourced from existing materials. The validity of questions is verified through a procedure based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

biancaraimondi/CompMath-MCQ
dataset· 39 dl
39 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Machine Learning in Materials Science · Mathematics Education and Programs