Improving the Validity of Automatically Generated Feedback via   Reinforcement Learning

Alexander Scarlatos; Digory Smith; Simon Woodhead; Andrew Lan

arXiv:2403.01304·cs.CL·December 13, 2024·1 cites

Improving the Validity of Automatically Generated Feedback via Reinforcement Learning

Alexander Scarlatos, Digory Smith, Simon Woodhead, Andrew Lan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a reinforcement learning framework to improve the correctness and pedagogical alignment of automatically generated math feedback using GPT-4 annotations, enhancing feedback quality in educational AI systems.

Contribution

It presents a novel rubric for evaluating math feedback, and a reinforcement learning method that optimizes feedback generation for correctness and pedagogical alignment using GPT-4 annotations.

Findings

01

Significant improvement in feedback correctness and alignment with Llama 2.

02

Effective use of GPT-4 for feedback annotation and evaluation.

03

Qualitative analysis demonstrates enhanced feedback quality.

Abstract

Automatically generating feedback via large language models (LLMs) in intelligent tutoring systems and online learning platforms has the potential to improve the learning outcomes of many students. However, both feedback generation and evaluation are challenging: feedback content has to be valid especially in subjects like math, which requires models to understand the problem, the solution, and where the student's error lies. Feedback also has to be pedagogically valid to reflect effective tutoring strategies, such as explaining possible misconceptions and encouraging the student, among other desirable features. In this work, we address both problems of automatically generating and evaluating feedback while considering both correctness and alignment. First, we propose a rubric for evaluating math feedback and show that GPT-4 is able to effectively use it to annotate human-written and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

umass-ml4ed/feedback-gen-dpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Neural Networks and Applications

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Multi-Head Attention · Layer Normalization · Dropout · Softmax · Dense Connections · Label Smoothing · Adam