S-GRADES -- Studying Generalization of Student Response Assessments in Diverse Evaluative Settings

Tasfia Seuti; Sagnik Ray Choudhury

arXiv:2603.10233·cs.CL·March 12, 2026

S-GRADES -- Studying Generalization of Student Response Assessments in Diverse Evaluative Settings

Tasfia Seuti, Sagnik Ray Choudhury

PDF

Open Access

TL;DR

This paper introduces S-GRADES, a comprehensive benchmark unifying diverse student response assessment datasets to evaluate and improve the generalization of automated grading models across different evaluative settings.

Contribution

The paper presents S-GRADES, a new open-source benchmark that consolidates multiple datasets and evaluation protocols for student response grading, enabling standardized and extensible assessment.

Findings

01

Large language models show varying performance across datasets.

02

Exemplar selection impacts grading accuracy and transferability.

03

Benchmark reveals gaps in model reliability and generalization.

Abstract

Evaluating student responses, from long essays to short factual answers, is a key challenge in educational NLP. Automated Essay Scoring (AES) focuses on holistic writing qualities such as coherence and argumentation, while Automatic Short Answer Grading (ASAG) emphasizes factual correctness and conceptual understanding. Despite their shared goal, these paradigms have progressed in isolation with fragmented datasets, inconsistent metrics, and separate communities. We introduce S-GRADES (Studying Generalization of Student Response Assessments in Diverse Evaluative Settings), a web-based benchmark that consolidates 14 diverse grading datasets under a unified interface with standardized access and reproducible evaluation protocols. The benchmark is fully open-source and designed for extensibility, enabling continuous integration of new datasets and evaluation settings. To demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Intelligent Tutoring Systems and Adaptive Learning · Text Readability and Simplification