ASAG2024: A Combined Benchmark for Short Answer Grading

G\'er\^ome Meyer; Philip Breuer; Jonathan F\"urst

arXiv:2409.18596·cs.AI·September 30, 2024

ASAG2024: A Combined Benchmark for Short Answer Grading

G\'er\^ome Meyer, Philip Breuer, Jonathan F\"urst

PDF

1 Datasets

TL;DR

The paper introduces ASAG2024, a comprehensive benchmark combining multiple datasets for short answer grading, to evaluate and compare automated grading systems across various subjects and scales.

Contribution

It presents the first unified benchmark for short answer grading, enabling systematic evaluation of SAG methods across diverse datasets and scales.

Findings

01

LLM-based SAG approaches outperform previous methods

02

Current automated systems still lag behind human grading performance

03

The benchmark facilitates future research in generalizable SAG solutions

Abstract

Open-ended questions test a more thorough understanding than closed-ended questions and are often a preferred assessment method. However, open-ended questions are tedious to grade and subject to personal bias. Therefore, there have been efforts to speed up the grading process through automation. Short Answer Grading (SAG) systems aim to automatically score students' answers. Despite growth in SAG methods and capabilities, there exists no comprehensive short-answer grading benchmark across different subjects, grading scales, and distributions. Thus, it is hard to assess the capabilities of current automated grading methods in terms of their generalizability. In this preliminary work, we introduce the combined ASAG2024 benchmark to facilitate the comparison of automated grading systems. Combining seven commonly used short-answer grading datasets in a common structure and grading scale.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Meyerger/ASAG2024
dataset· 121 dl
121 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training · Self-Attention Guidance · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings