Confidence Estimation in Automatic Short Answer Grading with LLMs

Longwei Cong; Sonja Hahn; Sebastian Gombert; Leon Camus; Hendrik Drachsler; Ulf Kroehne

arXiv:2605.00200·cs.CL·May 14, 2026

Confidence Estimation in Automatic Short Answer Grading with LLMs

Longwei Cong, Sonja Hahn, Sebastian Gombert, Leon Camus, Hendrik Drachsler, Ulf Kroehne

PDF

TL;DR

This paper explores confidence estimation methods for LLM-based automatic short answer grading, proposing a hybrid approach that combines model signals with dataset-derived uncertainty to improve reliability.

Contribution

It introduces a hybrid confidence framework that integrates model-based signals with dataset-derived uncertainty, enhancing confidence estimation in LLM-based grading.

Findings

01

Hybrid confidence measure improves reliability over single-source methods.

02

Clustering semantically embedded responses quantifies response heterogeneity.

03

Proposed approach enhances trustworthiness in AI-assisted educational assessment.

Abstract

Automatic Short Answer Grading (ASAG) with generative large language models (LLMs) has recently demonstrated strong performance without task-specific fine-tuning, while also enabling the generation of synthetic feedback for educational assessment. Despite these advances, LLM-based grading remains imperfect, making reliable confidence estimates essential for safe and effective human-AI collaboration in educational decision-making. In this work, we investigate confidence estimation for ASAG with LLMs by jointly considering model-based confidence signals and dataset-derived uncertainty. We systematically compare three model-based confidence estimation strategies, namely verbalizing, latent, and consistency-based confidence estimation, and show that model-based confidence alone is insufficient to reliably capture uncertainty in ASAG. To address this limitation, we propose a hybrid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.