SteLLA: A Structured Grading System Using LLMs with RAG

Hefei Qiu; Brian White; Ashley Ding; Reinaldo Costa; Ali Hachem; Wei Ding; Ping Chen

arXiv:2501.09092·cs.CL·May 26, 2025

SteLLA: A Structured Grading System Using LLMs with RAG

Hefei Qiu, Brian White, Ashley Ding, Reinaldo Costa, Ali Hachem, Wei Ding, Ping Chen

PDF

Open Access

TL;DR

This paper introduces SteLLA, a novel structured grading system that leverages Retrieval Augmented Generation with LLMs to improve automated short answer grading accuracy and provide detailed feedback, aligning closely with human grading.

Contribution

The paper proposes a new LLM-based grading system using RAG to extract structured info and evaluate answers, enhancing reliability and feedback quality in ASAG tasks.

Findings

01

Achieves substantial agreement with human graders.

02

Provides detailed breakdown of grades and feedback.

03

GPT-4 effectively captures facts but may infer too much.

Abstract

Large Language Models (LLMs) have shown strong general capabilities in many applications. However, how to make them reliable tools for some specific tasks such as automated short answer grading (ASAG) remains a challenge. We present SteLLA (Structured Grading System Using LLMs with RAG) in which a) Retrieval Augmented Generation (RAG) approach is used to empower LLMs specifically on the ASAG task by extracting structured information from the highly relevant and reliable external knowledge based on the instructor-provided reference answer and rubric, b) an LLM performs a structured and question-answering-based evaluation of student answers to provide analytical grades and feedback. A real-world dataset that contains students' answers in an exam was collected from a college-level Biology course. Experiments show that our proposed system can achieve substantial agreement with the human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications