Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge
Charles Koutcheme, Nicola Dainese, Sami Sarsa, Arto Hellas, Juho, Leinonen, Paul Denny

TL;DR
This study evaluates the quality of open source LLMs in providing student feedback, using GPT-4 as an evaluator, and finds some open models perform comparably to proprietary ones, supporting their educational use.
Contribution
It demonstrates GPT-4's potential as an automated evaluator and assesses open source LLMs' feedback quality in education, a previously understudied area.
Findings
GPT-4 shows moderate agreement with human evaluators.
Some open source models perform comparably to proprietary models.
GPT-4 tends to rate feedback more positively.
Abstract
Large language models (LLMs) have shown great potential for the automatic generation of feedback in a wide range of computing contexts. However, concerns have been voiced around the privacy and ethical implications of sending student work to proprietary models. This has sparked considerable interest in the use of open source LLMs in education, but the quality of the feedback that such open models can produce remains understudied. This is a concern as providing flawed or misleading generated feedback could be detrimental to student learning. Inspired by recent work that has utilised very powerful LLMs, such as GPT-4, to evaluate the outputs produced by less powerful models, we conduct an automated analysis of the quality of the feedback produced by several open source models using a dataset from an introductory programming course. First, we investigate the viability of employing GPT-4 as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Dropout · Label Smoothing · Residual Connection · Softmax · Absolute Position Encodings · Byte Pair Encoding
