Operationalizing Automated Essay Scoring: A Human-Aware Approach

Yenisel Plasencia-Cala\~na

arXiv:2506.21603·cs.CL·October 20, 2025

Operationalizing Automated Essay Scoring: A Human-Aware Approach

Yenisel Plasencia-Cala\~na

PDF

Open Access

TL;DR

This paper compares machine learning and large language models for automated essay scoring, focusing on human-centric aspects like bias, explainability, and robustness to improve trustworthiness.

Contribution

It provides a comparative analysis of ML and LLM approaches for AES, highlighting their strengths, weaknesses, and trade-offs in human-aware operationalization.

Findings

01

ML models outperform LLMs in accuracy

02

LLMs offer richer explanations

03

Both struggle with bias and robustness

Abstract

This paper explores the human-centric operationalization of Automated Essay Scoring (AES) systems, addressing aspects beyond accuracy. We compare various machine learning-based approaches with Large Language Models (LLMs) approaches, identifying their strengths, similarities and differences. The study investigates key dimensions such as bias, robustness, and explainability, considered important for human-aware operationalization of AES systems. Our study shows that ML-based AES models outperform LLMs in accuracy but struggle with explainability, whereas LLMs provide richer explanations. We also found that both approaches struggle with bias and robustness to edge scores. By analyzing these dimensions, the paper aims to identify challenges and trade-offs between different methods, contributing to more reliable and trustworthy AES methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications