Can Large Language Models Automatically Score Proficiency of Written   Essays?

Watheq Mansour; Salam Albatarni; Sohaila Eltanbouly; Tamer Elsayed

arXiv:2403.06149·cs.CL·April 17, 2024·3 cites

Can Large Language Models Automatically Score Proficiency of Written Essays?

Watheq Mansour, Salam Albatarni, Sohaila Eltanbouly, Tamer Elsayed

PDF

Open Access 1 Repo

TL;DR

This study evaluates the ability of large language models, ChatGPT and Llama, to score written essays automatically, comparing their performance to state-of-the-art models and exploring their potential to provide useful feedback.

Contribution

The paper demonstrates that LLMs can effectively score essays and offer feedback, with performance influenced by prompt design and model choice, highlighting their potential in automated essay scoring.

Findings

01

LLMs show comparable average performance in essay scoring

02

Prompt choice significantly affects model performance

03

LLMs can provide useful feedback to improve essay quality

Abstract

Although several methods were proposed to address the problem of automated essay scoring (AES) in the last 50 years, there is still much to desire in terms of effectiveness. Large Language Models (LLMs) are transformer-based models that demonstrate extraordinary capabilities on various tasks. In this paper, we test the ability of LLMs, given their powerful linguistic knowledge, to analyze and effectively score written essays. We experimented with two popular LLMs, namely ChatGPT and Llama. We aim to check if these models can do this task and, if so, how their performance is positioned among the state-of-the-art (SOTA) models across two levels, holistically and per individual writing trait. We utilized prompt-engineering tactics in designing four different prompts to bring their maximum potential to this task. Our experiments conducted on the ASAP dataset revealed several interesting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

watheq9/aes-with-llms
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification