Empirical Study of Large Language Models as Automated Essay Scoring   Tools in English Composition__Taking TOEFL Independent Writing Task for   Example

Wei Xia; Shaoguang Mao; Chanjing Zheng

arXiv:2401.03401·cs.CL·January 9, 2024·5 cites

Empirical Study of Large Language Models as Automated Essay Scoring Tools in English Composition__Taking TOEFL Independent Writing Task for Example

Wei Xia, Shaoguang Mao, Chanjing Zheng

PDF

Open Access

TL;DR

This study evaluates ChatGPT's effectiveness in automated scoring of TOEFL essays, highlighting its potential and limitations, especially with small sample sizes and the importance of expert-designed prompts.

Contribution

It demonstrates ChatGPT's capability for essay scoring in a low-data context and emphasizes the significance of prompt design and domain expertise.

Findings

01

ChatGPT can perform automated essay scoring with small samples.

02

Prompt design critically affects scoring accuracy.

03

Results show a regression effect in scoring performance.

Abstract

Large language models have demonstrated exceptional capabilities in tasks involving natural language generation, reasoning, and comprehension. This study aims to construct prompts and comments grounded in the diverse scoring criteria delineated within the official TOEFL guide. The primary objective is to assess the capabilities and constraints of ChatGPT, a prominent representative of large language models, within the context of automated essay scoring. The prevailing methodologies for automated essay scoring involve the utilization of deep neural networks, statistical machine learning techniques, and fine-tuning pre-trained models. However, these techniques face challenges when applied to different contexts or subjects, primarily due to their substantial data requirements and limited adaptability to small sample sizes. In contrast, this study employs ChatGPT to conduct an automated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEdcuational Technology Systems · Text Readability and Simplification · Topic Modeling