Empirical Study of Large Language Models as Automated Essay Scoring Tools in English Composition__Taking TOEFL Independent Writing Task for Example
Wei Xia, Shaoguang Mao, Chanjing Zheng

TL;DR
This study evaluates ChatGPT's effectiveness in automated scoring of TOEFL essays, highlighting its potential and limitations, especially with small sample sizes and the importance of expert-designed prompts.
Contribution
It demonstrates ChatGPT's capability for essay scoring in a low-data context and emphasizes the significance of prompt design and domain expertise.
Findings
ChatGPT can perform automated essay scoring with small samples.
Prompt design critically affects scoring accuracy.
Results show a regression effect in scoring performance.
Abstract
Large language models have demonstrated exceptional capabilities in tasks involving natural language generation, reasoning, and comprehension. This study aims to construct prompts and comments grounded in the diverse scoring criteria delineated within the official TOEFL guide. The primary objective is to assess the capabilities and constraints of ChatGPT, a prominent representative of large language models, within the context of automated essay scoring. The prevailing methodologies for automated essay scoring involve the utilization of deep neural networks, statistical machine learning techniques, and fine-tuning pre-trained models. However, these techniques face challenges when applied to different contexts or subjects, primarily due to their substantial data requirements and limited adaptability to small sample sizes. In contrast, this study employs ChatGPT to conduct an automated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEdcuational Technology Systems · Text Readability and Simplification · Topic Modeling
