Performance of the Pre-Trained Large Language Model GPT-4 on Automated   Short Answer Grading

Gerd Kortemeyer

arXiv:2309.09338·cs.CL·September 19, 2023·5 cites

Performance of the Pre-Trained Large Language Model GPT-4 on Automated Short Answer Grading

Gerd Kortemeyer

PDF

Open Access

TL;DR

This paper evaluates GPT-4's effectiveness in automated short answer grading, comparing it to specialized models and analyzing its performance on standard datasets without additional training.

Contribution

It provides a comprehensive assessment of GPT-4's capabilities in ASAG tasks, highlighting its strengths and limitations relative to specialized models.

Findings

01

GPT-4 performs comparably to hand-engineered models.

02

GPT-4 underperforms compared to specialized pre-trained LLMs.

03

Withholding reference answers affects grading performance.

Abstract

Automated Short Answer Grading (ASAG) has been an active area of machine-learning research for over a decade. It promises to let educators grade and give feedback on free-form responses in large-enrollment courses in spite of limited availability of human graders. Over the years, carefully trained models have achieved increasingly higher levels of performance. More recently, pre-trained Large Language Models (LLMs) emerged as a commodity, and an intriguing question is how a general-purpose tool without additional training compares to specialized models. We studied the performance of GPT-4 on the standard benchmark 2-way and 3-way datasets SciEntsBank and Beetle, where in addition to the standard task of grading the alignment of the student answer with a reference answer, we also investigated withholding the reference answer. We found that overall, the performance of the pre-trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer · Residual Connection · Adam · Layer Normalization