Fine-Tuning Models for Automated Code Review Feedback
Smitha S Kumar, Michael A Lones, Manuel Maarek, Hind Zantout

TL;DR
This study explores how parameter-efficient fine-tuning of open LLMs, specifically Code Llama, can improve automated code review feedback, making it comparable to proprietary models like ChatGPT.
Contribution
It demonstrates that PEFT significantly enhances feedback quality over prompt engineering, enabling effective, freely deployable educational tools.
Findings
PEFT improves feedback quality on buggy Java code
PEFT outperforms prompt engineering in feedback generation
Students find PEFT-generated feedback as effective as ChatGPT
Abstract
Large Language Models have introduced new possibilities for programming education through personalized support, content creation, and automated feedback. While recent studies have demonstrated the potential for feedback generation, many techniques rely on proprietary models, raising concerns about cost, computational demands, and the ethical implications of sharing student code. Open LLMs provide an alternative approach, but they do not currently have the capabilities of proprietary models. To address this problem, we investigate whether parameter-efficient fine-tuning (PEFT) and prompt engineering, both of which distil knowledge from a dataset derived from a large, more capable model, can be used to adapt and enhance the quality of feedback generated by the open LLM Code Llama. Feedback quality on buggy Java code was assessed using a combination of student evaluation, manual annotation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
