TL;DR
VEHME is a novel vision-language model designed to accurately evaluate handwritten mathematical expressions, providing interpretable reasoning and outperforming existing open-source solutions in educational technology.
Contribution
It introduces a two-phase training pipeline and an Expression-Aware Visual Prompting Module for improved spatial understanding and assessment of handwritten math responses.
Findings
Achieves state-of-the-art performance on AIHub and FERMAT datasets.
Approaches proprietary system accuracy with open-source methods.
Demonstrates robustness in diverse handwritten math formats.
Abstract
Automatically assessing handwritten mathematical solutions is an important problem in educational technology with practical applications, but it remains a significant challenge due to the diverse formats, unstructured layouts, and symbolic complexity of student work. To address this challenge, we introduce VEHME-a Vision-Language Model for Evaluating Handwritten Mathematics Expressions-designed to assess open-form handwritten math responses with high accuracy and interpretable reasoning traces. VEHME integrates a two-phase training pipeline: (i) supervised fine-tuning using structured reasoning data, and (ii) reinforcement learning that aligns model outputs with multi-dimensional grading objectives, including correctness, reasoning depth, and error localization. To enhance spatial understanding, we propose an Expression-Aware Visual Prompting Module, trained on our synthesized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
