AI-generated Essays: Characteristics and Implications on Automated Scoring and Academic Integrity
Yang Zhong, Jiangang Hao, Michael Fauss, Chen Li, Yuan Wang

TL;DR
This paper analyzes AI-generated essays' characteristics and their impact on automated scoring and academic integrity, highlighting limitations of current systems and potential detection strategies.
Contribution
It provides a large-scale empirical benchmark of AI-generated essays and evaluates the effectiveness of detection methods across different language models.
Findings
Existing automated scoring systems struggle with AI-generated essays.
Detectors trained on one model's essays can often identify others.
AI detection remains feasible despite increasing model variety.
Abstract
The rapid advancement of large language models (LLMs) has enabled the generation of coherent essays, making AI-assisted writing increasingly common in educational and professional settings. Using large-scale empirical data, we examine and benchmark the characteristics and quality of essays generated by popular LLMs and discuss their implications for two key components of writing assessments: automated scoring and academic integrity. Our findings highlight limitations in existing automated scoring systems, such as e-rater, when applied to essays generated or heavily influenced by AI, and identify areas for improvement, including the development of new features to capture deeper thinking and recalibrating feature weights. Despite growing concerns that the increasing variety of LLMs may undermine the feasibility of detecting AI-generated essays, our results show that detectors trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Online Learning and Analytics
