Unveiling the Tapestry of Automated Essay Scoring: A Comprehensive Investigation of Accuracy, Fairness, and Generalizability
Kaixun Yang, Mladen Rakovi\'c, Yuyang Li, Quanlong Guan, Dragan, Ga\v{s}evi\'c, Guanliang Chen

TL;DR
This study evaluates nine AES methods on accuracy, fairness, and generalizability using a large dataset, revealing trade-offs between prompt-specific and cross-prompt models and highlighting the importance of model choice for equitable assessment.
Contribution
It provides a comprehensive comparison of AES models across multiple metrics, emphasizing the impact of model type and prompt specificity on bias and performance.
Findings
Prompt-specific models outperform cross-prompt models in accuracy.
Prompt-specific models show greater bias related to economic status.
Traditional models with engineered features can achieve high accuracy and fairness.
Abstract
Automatic Essay Scoring (AES) is a well-established educational pursuit that employs machine learning to evaluate student-authored essays. While much effort has been made in this area, current research primarily focuses on either (i) boosting the predictive accuracy of an AES model for a specific prompt (i.e., developing prompt-specific models), which often heavily relies on the use of the labeled data from the same target prompt; or (ii) assessing the applicability of AES models developed on non-target prompts to the intended target prompt (i.e., developing the AES models in a cross-prompt setting). Given the inherent bias in machine learning and its potential impact on marginalized groups, it is imperative to investigate whether such bias exists in current AES methods and, if identified, how it intervenes with an AES model's accuracy and generalizability. Thus, our study aimed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsOnline Learning and Analytics · Text Readability and Simplification · Topic Modeling
