Grading Handwritten Engineering Exams with Multimodal Large Language Models
Janez Per\v{s}, Jon Muhovi\v{c}, Andrej Ko\v{s}ir, Bo\v{s}tjan Murovec

TL;DR
This paper introduces an end-to-end multimodal LLM-based system for grading handwritten engineering exams, achieving high accuracy and reliability while maintaining standard exam formats and providing auditable reports.
Contribution
It presents a novel workflow utilizing multimodal LLMs with structured prompting and reference grounding for reliable, scalable grading of handwritten STEM exams.
Findings
Achieves approximately 8-point mean absolute difference to lecturer grades.
Low bias and 17% manual review trigger rate at Dmax=40.
Structured prompting and reference grounding are crucial for accuracy.
Abstract
Handwritten STEM exams capture open-ended reasoning and diagrams, but manual grading is slow and difficult to scale. We present an end-to-end workflow for grading scanned handwritten engineering quizzes with multimodal large language models (LLMs) that preserves the standard exam process (A4 paper, unconstrained student handwriting). The lecturer provides only a handwritten reference solution (100%) and a short set of grading rules; the reference is converted into a text-only summary that conditions grading without exposing the reference scan. Reliability is achieved through a multi-stage design with a format/presence check to prevent grading blank answers, an ensemble of independent graders, supervisor aggregation, and rigid templates with deterministic validation to produce auditable, machine-parseable reports. We evaluate the frozen pipeline in a clean-room protocol on a held-out…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Natural Language Processing Techniques
