Large Language Model as an Assignment Evaluator: Insights, Feedback, and   Challenges in a 1000+ Student Course

Cheng-Han Chiang; Wei-Chih Chen; Chun-Yi Kuan; Chienchou Yang; Hung-yi; Lee

arXiv:2407.05216·cs.CL·September 24, 2024·1 cites

Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course

Cheng-Han Chiang, Wei-Chih Chen, Chun-Yi Kuan, Chienchou Yang, Hung-yi, Lee

PDF

Open Access

TL;DR

This study explores the use of GPT-4 as an automatic assignment evaluator in a large university course, revealing its acceptability, limitations, and potential for future improvement in educational settings.

Contribution

It provides empirical insights into applying LLMs for assignment evaluation in real classrooms, highlighting challenges and offering practical recommendations.

Findings

01

Students accept LLM evaluation with free access.

02

LLMs sometimes fail to follow evaluation instructions.

03

Students can manipulate LLM outputs to achieve high scores.

Abstract

Using large language models (LLMs) for automatic evaluation has become an important evaluation method in NLP research. However, it is unclear whether these LLM-based evaluators can be applied in real-world classrooms to assess student assignments. This empirical report shares how we use GPT-4 as an automatic assignment evaluator in a university course with 1,028 students. Based on student responses, we find that LLM-based assignment evaluators are generally acceptable to students when students have free access to these LLM-based evaluators. However, students also noted that the LLM sometimes fails to adhere to the evaluation instructions. Additionally, we observe that students can easily manipulate the LLM-based evaluator to output specific strings, allowing them to achieve high scores without meeting the assignment rubric. Based on student feedback and our experience, we provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEdcuational Technology Systems · Topic Modeling · Education Practices and Evaluation

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Residual Connection · Byte Pair Encoding · Layer Normalization · Label Smoothing · Adam · Dropout