Grade Like a Human: Rethinking Automated Assessment with Large Language Models
Wenjing Xie, Juxin Niu, Chun Jason Xue, Nan Guan

TL;DR
This paper presents a comprehensive LLM-based automated grading system that improves the entire grading process, including rubric design, scoring, and review, demonstrating effectiveness on new and existing datasets.
Contribution
It introduces a holistic approach to automated grading with LLMs, covering rubric creation, scoring, and post-review, which is a novel advancement over prior methods focusing only on scoring.
Findings
Effective grading accuracy on new OS dataset
Improved consistency and fairness in scoring
Insights into LLM capabilities for comprehensive grading
Abstract
While large language models (LLMs) have been used for automated grading, they have not yet achieved the same level of performance as humans, especially when it comes to grading complex questions. Existing research on this topic focuses on a particular step in the grading procedure: grading using predefined rubrics. However, grading is a multifaceted procedure that encompasses other crucial steps, such as grading rubrics design and post-grading review. There has been a lack of systematic research exploring the potential of LLMs to enhance the entire grading~process. In this paper, we propose an LLM-based grading system that addresses the entire grading procedure, including the following key components: 1) Developing grading rubrics that not only consider the questions but also the student answers, which can more accurately reflect students' performance. 2) Under the guidance of grading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
