Beyond Grading Accuracy: Exploring Alignment of TAs and LLMs

Matthijs Jansen op de Haar; Nacir Bouali; Faizan Ahmed

arXiv:2603.16357·cs.CY·March 18, 2026

Beyond Grading Accuracy: Exploring Alignment of TAs and LLMs

Matthijs Jansen op de Haar, Nacir Bouali, Faizan Ahmed

PDF

Open Access

TL;DR

This study evaluates open-source LLMs for grading UML class diagrams, comparing their performance to TAs at the criterion level, and demonstrates their potential to support automated grading with high accuracy and correlation.

Contribution

It introduces a criterion-level grading pipeline using open-source LLMs for UML diagrams, addressing transparency and cost issues in automated assessment.

Findings

01

Per-criterion accuracy up to 88.56%

02

Pearson correlation up to 0.78 with human grades

03

Optimal model combining best LLMs approaches TA performance

Abstract

In this paper, we investigate the potential of open-source Large Language Models (LLMs) for grading Unified Modeling Language (UML) class diagrams. In contrast to existing work, which primarily evaluates proprietary LLMs, we focus on non-proprietary models, making our approach suitable for universities where transparency and cost are critical. Additionally, existing studies assess performance over complete diagrams rather than individual criteria, offering limited insight into how automated grading aligns with human evaluation. To address these gaps, we propose a grading pipeline in which student-generated UML class diagrams are independently evaluated by both teaching assistants (TAs) and LLMs. Grades are then compared at the level of individual criteria. We evaluate this pipeline through a quantitative study of 92 UML class diagrams from a software design course, comparing TA grades…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Teaching and Learning Programming · Innovative Teaching and Learning Methods