Improving the Validity of Automatically Generated Feedback via Reinforcement Learning
Alexander Scarlatos, Digory Smith, Simon Woodhead, Andrew Lan

TL;DR
This paper introduces a reinforcement learning framework to improve the correctness and pedagogical alignment of automatically generated math feedback using GPT-4 annotations, enhancing feedback quality in educational AI systems.
Contribution
It presents a novel rubric for evaluating math feedback, and a reinforcement learning method that optimizes feedback generation for correctness and pedagogical alignment using GPT-4 annotations.
Findings
Significant improvement in feedback correctness and alignment with Llama 2.
Effective use of GPT-4 for feedback annotation and evaluation.
Qualitative analysis demonstrates enhanced feedback quality.
Abstract
Automatically generating feedback via large language models (LLMs) in intelligent tutoring systems and online learning platforms has the potential to improve the learning outcomes of many students. However, both feedback generation and evaluation are challenging: feedback content has to be valid especially in subjects like math, which requires models to understand the problem, the solution, and where the student's error lies. Feedback also has to be pedagogically valid to reflect effective tutoring strategies, such as explaining possible misconceptions and encouraging the student, among other desirable features. In this work, we address both problems of automatically generating and evaluating feedback while considering both correctness and alignment. First, we propose a rubric for evaluating math feedback and show that GPT-4 is able to effectively use it to annotate human-written and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Neural Networks and Applications
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Multi-Head Attention · Layer Normalization · Dropout · Softmax · Dense Connections · Label Smoothing · Adam
