Using Large Language Models for Automated Grading of Student Writing about Science
Chris Impey, Matthew Wenger, Nikhil Garuda, Shahriar Golchin, Sarah, Stamer

TL;DR
This study investigates GPT-4's ability to reliably grade student science writing in MOOCs, finding it matches or exceeds instructor and peer grading reliability, suggesting potential for scalable automated assessment.
Contribution
The paper demonstrates that GPT-4 can reliably evaluate and grade student writing in science MOOCs, offering a scalable alternative to traditional instructor grading methods.
Findings
GPT-4's grading reliability surpasses peer grading.
GPT-4's assessments match instructor grades.
Automated grading with LLMs is feasible for large classes.
Abstract
Assessing writing in large classes for formal or informal learners presents a significant challenge. Consequently, most large classes, particularly in science, rely on objective assessment tools such as multiple-choice quizzes, which have a single correct answer. The rapid development of AI has introduced the possibility of using large language models (LLMs) to evaluate student writing. An experiment was conducted using GPT-4 to determine if machine learning methods based on LLMs can match or exceed the reliability of instructor grading in evaluating short writing assignments on topics in astronomy. The audience consisted of adult learners in three massive open online courses (MOOCs) offered through Coursera. One course was on astronomy, the second was on astrobiology, and the third was on the history and philosophy of astronomy. The results should also be applicable to non-science…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Intelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Softmax · Dense Connections · Dropout · Residual Connection · Multi-Head Attention · Adam
