Evaluating the Reasoning Abilities of LLMs on Underrepresented Mathematics Competition Problems
Samuel Golladay, Majid Bani-Yaghoub

TL;DR
This study assesses the reasoning abilities of three leading LLMs on underrepresented mathematics competition problems, revealing strengths in calculus and weaknesses in geometry, with distinct error patterns across models.
Contribution
It introduces an evaluation of LLM performance on underrepresented math problems, highlighting specific error types and model differences in reasoning and accuracy.
Findings
DeepSeek-V3 outperforms other models across categories.
All models show weak performance in Geometry.
Different models exhibit distinct error patterns.
Abstract
Understanding the limitations of Large Language Models, or LLMs, in mathematical reasoning has been the focus of several recent studies. However, the majority of these studies use the same datasets for benchmarking, which limits the generalizability of their findings and may not fully capture the diverse challenges present in mathematical tasks. The purpose of the present study is to analyze the performance of LLMs on underrepresented mathematics competition problems. We prompted three leading LLMs, namely GPT-4o-mini, Gemini-2.0-Flash, and DeepSeek-V3, with the Missouri Collegiate Mathematics Competition problems in the areas of Calculus, Analytic Geometry, and Discrete Mathematics. The LLMs responses were then compared to the known correct solutions in order to determine the accuracy of the LLM for each problem domain. We also analyzed the LLMs reasoning to explore patterns in errors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Mathematics Education and Teaching Techniques · Intelligent Tutoring Systems and Adaptive Learning
