Auditing an Automatic Grading Model with deep Reinforcement Learning

Aubrey Condor; Zachary Pardos

arXiv:2405.07087·cs.AI·May 14, 2024

Auditing an Automatic Grading Model with deep Reinforcement Learning

Aubrey Condor, Zachary Pardos

PDF

Open Access

TL;DR

This paper uses deep reinforcement learning to audit an automatic short answer grading model, revealing potential vulnerabilities and exposing limitations beyond simple agreement metrics.

Contribution

It introduces a reinforcement learning approach to systematically challenge and evaluate the robustness of automatic grading models.

Findings

01

Reinforcement learning can identify weaknesses in grading models.

02

High agreement scores do not guarantee model reliability.

03

Automated graders can be exploited through response revisions.

Abstract

We explore the use of deep reinforcement learning to audit an automatic short answer grading (ASAG) model. Automatic grading may decrease the time burden of rating open-ended items for educators, but a lack of robust evaluation methods for these models can result in uncertainty of their quality. Current state-of-the-art ASAG models are configured to match human ratings from a training set, and researchers typically assess their quality with accuracy metrics that signify agreement between model and human scores. In this paper, we show that a high level of agreement to human ratings does not give sufficient evidence that an ASAG model is infallible. We train a reinforcement learning agent to revise student responses with the objective of achieving a high rating from an automatic grading model in the least number of revisions. By analyzing the agent's revised responses that achieve a high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNon-Destructive Testing Techniques