Large Language Models as Partners in Student Essay Evaluation
Toru Ishida, Tongxi Liu, Hailong Wang, and William K. Cheung

TL;DR
This study explores the potential of Large Language Models to evaluate student essays, demonstrating their ability to match faculty assessments and proposing their role as collaborative partners in evaluation processes.
Contribution
It introduces a novel application of LLMs in essay evaluation, comparing different scenarios and analyzing their assessment quality and diversity.
Findings
Strong correlation between LLM and faculty assessments in pairwise comparison with rubrics
LLMs can match faculty assessment capabilities
Variations in LLM assessments reflect diversity, not confusion
Abstract
As the importance of comprehensive evaluation in workshop courses increases, there is a growing demand for efficient and fair assessment methods that reduce the workload for faculty members. This paper presents an evaluation conducted with Large Language Models (LLMs) using actual student essays in three scenarios: 1) without providing guidance such as rubrics, 2) with pre-specified rubrics, and 3) through pairwise comparison of essays. Quantitative analysis of the results revealed a strong correlation between LLM and faculty member assessments in the pairwise comparison scenario with pre-specified rubrics, although concerns about the quality and stability of evaluations remained. Therefore, we conducted a qualitative analysis of LLM assessment comments, showing that: 1) LLMs can match the assessment capabilities of faculty members, 2) variations in LLM assessments should be interpreted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
