Large Language Models Don't Make Sense of Word Problems. A Scoping Review from a Mathematics Education Perspective
Anselm R. Strohmaier, Wim Van Dooren, Kathrin Se{\ss}ler, Brian Greer, Lieven Verschaffel

TL;DR
This paper reviews the capabilities of large language models in solving mathematical word problems, finding they excel at superficial problem-solving but struggle with real-world context understanding, limiting their educational usefulness.
Contribution
It provides a comprehensive scoping review, including technical, literature, and empirical analyses, revealing LLMs' superficial understanding of word problems from a mathematics education perspective.
Findings
LLMs solve s-problems with near-perfect accuracy.
Most word problems in research lack real-world context.
LLMs struggle with problems involving real-world or nonsensical contexts.
Abstract
The progress of Large Language Models (LLMs) like ChatGPT raises the question of how they can be integrated into education. One hope is that they can support mathematics learning, including word-problem solving. Since LLMs can handle textual input with ease, they appear well-suited for solving mathematical word problems. Yet their real competence, whether they can make sense of the real-world context, and the implications for classrooms remain unclear. We conducted a scoping review from a mathematics-education perspective, including three parts: a technical overview, a systematic review of word problems used in research, and a state-of-the-art empirical evaluation of LLMs on mathematical word problems. First, in the technical overview, we contrast the conceptualization of word problems and their solution processes between LLMs and students. In computer-science research this is typically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Text Readability and Simplification · Computational and Text Analysis Methods
MethodsDropout · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Dense Connections · Softmax · Transformer · PrIme Sample Attention · ALIGN
