Exploring the Utilities of the Rationales from Large Language Models to Enhance Automated Essay Scoring
Hong Jiao, Hanna Choi, Haowei Hua

TL;DR
This paper investigates how rationales generated by GPT-4.1 and GPT-5 can be used to improve automated essay scoring, comparing their utility to traditional essay-based methods and exploring ensemble approaches.
Contribution
It introduces the use of large language model rationales in automated scoring and evaluates their effectiveness, demonstrating ensemble models achieve the highest accuracy.
Findings
Essay-based scoring outperforms rationale-based scoring overall.
Rationale-based scoring improves accuracy for less-represented classes.
Ensemble models combining essay and rationale scoring achieve the best results.
Abstract
This study explored the utilities of rationales generated by GPT-4.1 and GPT-5 in automated scoring using Prompt 6 essays from the 2012 Kaggle ASAP data. Essay-based scoring was compared with rationale-based scoring. The study found in general essay-based scoring performed better than rationale-based scoring with higher Quadratic Weighted Kappa (QWK). However, rationale-based scoring led to higher scoring accuracy in terms of F1 scores for score 0 which had less representation due to class imbalance issues. The ensemble modeling of essay-based scoring models increased the scoring accuracy at both specific score levels and across all score levels. The ensemble modeling of essay-based scoring and each of the rationale-based scoring performed about the same. Further ensemble of essay-based scoring and both rationale-based scoring yielded the best scoring accuracy with QWK of 0.870 compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Psychometric Methodologies and Testing · Topic Modeling
