CogMath: Assessing LLMs' Authentic Mathematical Ability from a Human Cognitive Perspective
Jiayu Liu, Zhenya Huang, Wei Dai, Cheng Cheng, Jinze Wu, Jing Sha, Song Li, Qi Liu, Shijin Wang, Enhong Chen

TL;DR
CogMath introduces a human cognition-inspired framework to evaluate LLMs' mathematical abilities across detailed reasoning stages, revealing overestimations in current benchmarks and providing insights for improvement.
Contribution
This paper presents CogMath, a novel multi-dimensional assessment method based on human reasoning stages to evaluate LLMs' authentic mathematical capabilities.
Findings
LLMs' mathematical abilities are overestimated by 30-40% using traditional benchmarks.
CogMath identifies specific strengths and weaknesses of LLMs across reasoning stages.
The framework offers in-depth insights to guide future improvements in LLM reasoning.
Abstract
Although large language models (LLMs) show promise in solving complex mathematical tasks, existing evaluation paradigms rely solely on a coarse measure of overall answer accuracy, which are insufficient for assessing their authentic capabilities. In this paper, we propose \textbf{CogMath}, which comprehensively assesses LLMs' mathematical abilities through the lens of human cognition. Specifically, inspired by psychological theories, CogMath formalizes human reasoning process into 3 stages: \emph{problem comprehension}, \emph{problem solving}, and \emph{solution summarization}. Within these stages, we investigate perspectives such as numerical calculation, knowledge, and counterfactuals, and design a total of 9 fine-grained evaluation dimensions. In each dimension, we develop an ``\emph{Inquiry}-\emph{Judge}-\emph{Reference}'' multi-agent system to generate inquiries that assess LLMs'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMathematics, Computing, and Information Processing · Cognitive and developmental aspects of mathematical skills · Mathematics Education and Teaching Techniques
