A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimization

Yucheng Chu; Hang Li; Kaiqi Yang; Harry Shomer; Hui Liu; Yasemin Copur-Gencturk; Jiliang Tang

arXiv:2410.02165·cs.AI·June 5, 2025

A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimization

Yucheng Chu, Hang Li, Kaiqi Yang, Harry Shomer, Hui Liu, Yasemin Copur-Gencturk, Jiliang Tang

PDF

Open Access

TL;DR

This paper introduces GradeOpt, an LLM-powered multi-agent framework that automatically optimizes grading guidelines for open-ended short-answer questions, achieving human-level accuracy and consistency in automated grading tasks.

Contribution

It presents a novel multi-agent system with self-reflection capabilities that enhances automatic short-answer grading accuracy and generalizability.

Findings

01

GradeOpt outperforms baseline models in grading accuracy.

02

The framework aligns well with human grading behavior.

03

Ablation studies validate the effectiveness of each component.

Abstract

Open-ended short-answer questions (SAGs) have been widely recognized as a powerful tool for providing deeper insights into learners' responses in the context of learning analytics (LA). However, SAGs often present challenges in practice due to the high grading workload and concerns about inconsistent assessments. With recent advancements in natural language processing (NLP), automatic short-answer grading (ASAG) offers a promising solution to these challenges. Despite this, current ASAG algorithms are often limited in generalizability and tend to be tailored to specific questions. In this paper, we propose a unified multi-agent ASAG framework, GradeOpt, which leverages large language models (LLMs) as graders for SAGs. More importantly, GradeOpt incorporates two additional LLM-based agents - the reflector and the refiner - into the multi-agent system. This enables GradeOpt to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Technology and Assessment