Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology

Wei Xie; Shuoyoucheng Ma; Zhenhua Wang; Enze Wang; Kai Chen; Xiaobing Sun; Baosheng Wang

arXiv:2410.14979·cs.AI·September 23, 2025

Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology

Wei Xie, Shuoyoucheng Ma, Zhenhua Wang, Enze Wang, Kai Chen, Xiaobing Sun, Baosheng Wang

PDF

Open Access

TL;DR

This study investigates whether large language models truly understand mathematics by testing their reasoning on modified cognitive reflection problems, revealing they rely more on pattern matching than genuine reasoning, thus challenging assumptions about their human-like cognition.

Contribution

The paper provides empirical evidence that mainstream LLMs do not exhibit human-like mathematical reasoning and primarily depend on pattern recognition, even with advanced prompting techniques.

Findings

01

LLMs perform poorly on modified CRT problems, with accuracy dropping by up to 50%.

02

LLMs rely mainly on pattern matching, akin to System 1 thinking.

03

Results challenge the notion that LLMs possess human-like mathematical reasoning.

Abstract

The cognitive mechanism by which Large Language Models (LLMs) solve mathematical problems remains a widely debated and unresolved issue. Currently, there is little interpretable experimental evidence that connects LLMs' problem-solving with human cognitive psychology.To determine if LLMs possess human-like mathematical reasoning, we modified the problems used in the human Cognitive Reflection Test (CRT). Our results show that, even with the use of Chains of Thought (CoT) prompts, mainstream LLMs, including the latest o1 model (noted for its reasoning capabilities), have a high error rate when solving these modified CRT problems. Specifically, the average accuracy rate dropped by up to 50% compared to the original questions.Further analysis of LLMs' incorrect answers suggests that they primarily rely on pattern matching from their training data, which aligns more with human intuition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Intelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics