VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions

Thu Phuong Nguyen; Duc M. Nguyen; Hyotaek Jeon; Hyunwook Lee; Hyunmin Song; Sungahn Ko; Taehwan Kim

arXiv:2510.22798·cs.CL·October 28, 2025

VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions

Thu Phuong Nguyen, Duc M. Nguyen, Hyotaek Jeon, Hyunwook Lee, Hyunmin Song, Sungahn Ko, Taehwan Kim

PDF

1 Video

TL;DR

VEHME is a novel vision-language model designed to accurately evaluate handwritten mathematical expressions, providing interpretable reasoning and outperforming existing open-source solutions in educational technology.

Contribution

It introduces a two-phase training pipeline and an Expression-Aware Visual Prompting Module for improved spatial understanding and assessment of handwritten math responses.

Findings

01

Achieves state-of-the-art performance on AIHub and FERMAT datasets.

02

Approaches proprietary system accuracy with open-source methods.

03

Demonstrates robustness in diverse handwritten math formats.

Abstract

Automatically assessing handwritten mathematical solutions is an important problem in educational technology with practical applications, but it remains a significant challenge due to the diverse formats, unstructured layouts, and symbolic complexity of student work. To address this challenge, we introduce VEHME-a Vision-Language Model for Evaluating Handwritten Mathematics Expressions-designed to assess open-form handwritten math responses with high accuracy and interpretable reasoning traces. VEHME integrates a two-phase training pipeline: (i) supervised fine-tuning using structured reasoning data, and (ii) reinforcement learning that aligns model outputs with multi-dimensional grading objectives, including correctness, reasoning depth, and error localization. To enhance spatial understanding, we propose an Expression-Aware Visual Prompting Module, trained on our synthesized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions· underline