Open Source Language Models Can Provide Feedback: Evaluating LLMs'   Ability to Help Students Using GPT-4-As-A-Judge

Charles Koutcheme; Nicola Dainese; Sami Sarsa; Arto Hellas; Juho; Leinonen; Paul Denny

arXiv:2405.05253·cs.CL·May 9, 2024

Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge

Charles Koutcheme, Nicola Dainese, Sami Sarsa, Arto Hellas, Juho, Leinonen, Paul Denny

PDF

Open Access 1 Repo

TL;DR

This study evaluates the quality of open source LLMs in providing student feedback, using GPT-4 as an evaluator, and finds some open models perform comparably to proprietary ones, supporting their educational use.

Contribution

It demonstrates GPT-4's potential as an automated evaluator and assesses open source LLMs' feedback quality in education, a previously understudied area.

Findings

01

GPT-4 shows moderate agreement with human evaluators.

02

Some open source models perform comparably to proprietary models.

03

GPT-4 tends to rate feedback more positively.

Abstract

Large language models (LLMs) have shown great potential for the automatic generation of feedback in a wide range of computing contexts. However, concerns have been voiced around the privacy and ethical implications of sending student work to proprietary models. This has sparked considerable interest in the use of open source LLMs in education, but the quality of the feedback that such open models can produce remains understudied. This is a concern as providing flawed or misleading generated feedback could be detrimental to student learning. Inspired by recent work that has utilised very powerful LLMs, such as GPT-4, to evaluate the outputs produced by less powerful models, we conduct an automated analysis of the quality of the feedback produced by several open source models using a dataset from an introductory programming course. First, we investigate the viability of employing GPT-4 as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

koutchemecharles/iticse24
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Dropout · Label Smoothing · Residual Connection · Softmax · Absolute Position Encodings · Byte Pair Encoding