Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math
Dingjie Song, Tianlong Xu, Yi-Fan Zhang, Hang Li, Zhiling Yan, Xing Fan, Haoyang Li, Lichao Sun, Qingsong Wen

TL;DR
This paper introduces ScratchMath, a benchmark dataset for analyzing errors in handwritten math work, evaluating 16 multimodal models and highlighting gaps in current AI capabilities for educational error diagnosis.
Contribution
The paper presents ScratchMath, a new annotated dataset for error explanation in handwritten math, and systematically evaluates multimodal models, revealing current limitations and potential for improvement.
Findings
Proprietary models outperform open-source counterparts.
Large reasoning models excel at error explanation.
Significant performance gaps remain compared to human experts.
Abstract
Assessing student handwritten scratchwork is crucial for personalized educational feedback but presents unique challenges due to diverse handwriting, complex layouts, and varied problem-solving approaches. Existing educational NLP primarily focuses on textual responses and neglects the complexity and multimodality inherent in authentic handwritten scratchwork. Current multimodal large language models (MLLMs) excel at visual reasoning but typically adopt an "examinee perspective", prioritizing generating correct answers rather than diagnosing student errors. To bridge these gaps, we introduce ScratchMath, a novel benchmark specifically designed for explaining and classifying errors in authentic handwritten mathematics scratchwork. Our dataset comprises 1,720 mathematics samples from Chinese primary and middle school students, supporting two key tasks: Error Cause Explanation (ECE) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics Education and Teaching Techniques · Intelligent Tutoring Systems and Adaptive Learning · Multimodal Machine Learning Applications
