CHECK-MAT: Checking Hand-Written Mathematical Answers for the Russian Unified State Exam
Ruslan Khrulev

TL;DR
This paper presents a new benchmark for evaluating vision-language models' ability to assess handwritten math solutions, focusing on understanding student work, identifying errors, and grading according to official criteria.
Contribution
It introduces the EGE-Math Solutions Assessment Benchmark and evaluates multiple VLMs on real exam data, highlighting current limitations in AI-based assessment.
Findings
Current VLMs struggle with mathematical reasoning.
Models show limited alignment with human grading rubrics.
Benchmark provides a new standard for AI assessment of handwritten math solutions.
Abstract
This paper introduces a novel benchmark, EGE-Math Solutions Assessment Benchmark, for evaluating Vision-Language Models (VLMs) on their ability to assess hand-written mathematical solutions. Unlike existing benchmarks that focus on problem solving, our approach centres on understanding student solutions, identifying mistakes, and assigning grades according to fixed criteria. We compile 122 scanned solutions from the Russian Unified State Exam (EGE) together with official expert grades, and evaluate seven modern VLMs from Google, OpenAI, Arcee AI, and Alibaba Cloud in three inference modes. The results reveal current limitations in mathematical reasoning and human-rubric alignment, opening new research avenues in AI-assisted assessment. You can find code in https://github.com/Karifannaa/Auto-check-EGE-math
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHigher Education Learning Practices
