Grading Handwritten Engineering Exams with Multimodal Large Language Models

Janez Per\v{s}; Jon Muhovi\v{c}; Andrej Ko\v{s}ir; Bo\v{s}tjan Murovec

arXiv:2601.00730·cs.CV·January 5, 2026

Grading Handwritten Engineering Exams with Multimodal Large Language Models

Janez Per\v{s}, Jon Muhovi\v{c}, Andrej Ko\v{s}ir, Bo\v{s}tjan Murovec

PDF

Open Access

TL;DR

This paper introduces an end-to-end multimodal LLM-based system for grading handwritten engineering exams, achieving high accuracy and reliability while maintaining standard exam formats and providing auditable reports.

Contribution

It presents a novel workflow utilizing multimodal LLMs with structured prompting and reference grounding for reliable, scalable grading of handwritten STEM exams.

Findings

01

Achieves approximately 8-point mean absolute difference to lecturer grades.

02

Low bias and 17% manual review trigger rate at Dmax=40.

03

Structured prompting and reference grounding are crucial for accuracy.

Abstract

Handwritten STEM exams capture open-ended reasoning and diagrams, but manual grading is slow and difficult to scale. We present an end-to-end workflow for grading scanned handwritten engineering quizzes with multimodal large language models (LLMs) that preserves the standard exam process (A4 paper, unconstrained student handwriting). The lecturer provides only a handwritten reference solution (100%) and a short set of grading rules; the reference is converted into a text-only summary that conditions grading without exposing the reference scan. Reliability is achieved through a multi-stage design with a format/presence check to prevent grading blank answers, an ensemble of independent graders, supervisor aggregation, and rigid templates with deterministic validation to produce auditable, machine-parseable reports. We evaluate the frozen pipeline in a clean-room protocol on a held-out…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Topic Modeling · Natural Language Processing Techniques