Operationalizing Automated Essay Scoring: A Human-Aware Approach
Yenisel Plasencia-Cala\~na

TL;DR
This paper compares machine learning and large language models for automated essay scoring, focusing on human-centric aspects like bias, explainability, and robustness to improve trustworthiness.
Contribution
It provides a comparative analysis of ML and LLM approaches for AES, highlighting their strengths, weaknesses, and trade-offs in human-aware operationalization.
Findings
ML models outperform LLMs in accuracy
LLMs offer richer explanations
Both struggle with bias and robustness
Abstract
This paper explores the human-centric operationalization of Automated Essay Scoring (AES) systems, addressing aspects beyond accuracy. We compare various machine learning-based approaches with Large Language Models (LLMs) approaches, identifying their strengths, similarities and differences. The study investigates key dimensions such as bias, robustness, and explainability, considered important for human-aware operationalization of AES systems. Our study shows that ML-based AES models outperform LLMs in accuracy but struggle with explainability, whereas LLMs provide richer explanations. We also found that both approaches struggle with bias and robustness to edge scores. By analyzing these dimensions, the paper aims to identify challenges and trade-offs between different methods, contributing to more reliable and trustworthy AES methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications
