Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?
Liang Zhang, Yu Fu, Xinyi Jin

TL;DR
This study investigates whether large language models' math problem-solving skills are linked to their ability to accurately assess solutions, revealing that stronger problem-solving correlates with better assessment, but additional skills are needed for precise error detection.
Contribution
It provides empirical evidence that math problem-solving expertise in LLMs enhances assessment performance, highlighting the importance of step tracking and error localization capabilities.
Findings
Assessment accuracy is higher on items the model solved correctly.
Assessment remains more challenging than problem solving, especially with errors.
Strong problem-solving skills support better assessment, but additional capabilities are needed.
Abstract
Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with stronger step-level assessment performance. This study examines that relationship using the GSM8K and MATH subsets of PROCESSBENCH, a human-annotated benchmark for identifying the earliest erroneous step in mathematical reasoning. We evaluate two LLM-based math tutor agent settings, instantiated with GPT-4 and GPT-5, in two independent tasks on the same math problems: solving the original problem and assessing a benchmark-provided solution by predicting the earliest erroneous step. Results show a consistent within-model pattern: assessment accuracy is substantially higher on math problem items the same model solved correctly than on items it solved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Innovative Teaching and Learning Methods · Cognitive and developmental aspects of mathematical skills
