CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State Exam

Ruslan Khrulev

arXiv:2507.22958·cs.CV·August 1, 2025

CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State Exam

Ruslan Khrulev

PDF

Open Access 1 Datasets

TL;DR

This paper presents a new benchmark for evaluating vision-language models' ability to assess handwritten math solutions, focusing on understanding student work, identifying errors, and grading according to official criteria.

Contribution

It introduces the EGE-Math Solutions Assessment Benchmark and evaluates multiple VLMs on real exam data, highlighting current limitations in AI-based assessment.

Findings

01

Current VLMs struggle with mathematical reasoning.

02

Models show limited alignment with human grading rubrics.

03

Benchmark provides a new standard for AI assessment of handwritten math solutions.

Abstract

This paper introduces a novel benchmark, EGE-Math Solutions Assessment Benchmark, for evaluating Vision-Language Models (VLMs) on their ability to assess hand-written mathematical solutions. Unlike existing benchmarks that focus on problem solving, our approach centres on understanding student solutions, identifying mistakes, and assigning grades according to fixed criteria. We compile 122 scanned solutions from the Russian Unified State Exam (EGE) together with official expert grades, and evaluate seven modern VLMs from Google, OpenAI, Arcee AI, and Alibaba Cloud in three inference modes. The results reveal current limitations in mathematical reasoning and human-rubric alignment, opening new research avenues in AI-assisted assessment. You can find code in https://github.com/Karifannaa/Auto-check-EGE-math

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Karifannaa/EGE_Math_Solutions_Assessment_Benchmark
dataset· 44 dl
44 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHigher Education Learning Practices