Can Large Language Models Differentiate Harmful from Argumentative Essays? Steps Toward Ethical Essay Scoring

Hongjin Kim; Jeonghyun Kang; Harksoo Kim

arXiv:2601.05545·cs.CL·January 12, 2026

Can Large Language Models Differentiate Harmful from Argumentative Essays? Steps Toward Ethical Essay Scoring

Hongjin Kim, Jeonghyun Kang, Harksoo Kim

PDF

Open Access

TL;DR

This paper evaluates the ability of Large Language Models to detect harmful content in essays, highlighting current limitations and emphasizing the need for ethically aware automated scoring systems.

Contribution

Introduces the Harmful Essay Detection benchmark to assess LLMs' effectiveness in recognizing ethically problematic content in essays.

Findings

01

LLMs need improvement to distinguish harmful from argumentative essays

02

Current AES models often overlook ethical considerations in scoring

03

Highlighting the importance of ethical sensitivity in automated essay scoring

Abstract

This study addresses critical gaps in Automated Essay Scoring (AES) systems and Large Language Models (LLMs) with regard to their ability to effectively identify and score harmful essays. Despite advancements in AES technology, current models often overlook ethically and morally problematic elements within essays, erroneously assigning high scores to essays that may propagate harmful opinions. In this study, we introduce the Harmful Essay Detection (HED) benchmark, which includes essays integrating sensitive topics such as racism and gender bias, to test the efficacy of various LLMs in recognizing and scoring harmful content. Our findings reveal that: (1) LLMs require further enhancement to accurately distinguish between harmful and argumentative essays, and (2) both current AES models and LLMs fail to consider the ethical dimensions of content during scoring. The study underscores the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Explainable Artificial Intelligence (XAI)