Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
Mathieu Rita, Florian Strub, Rahma Chaabouni, Paul Michel, Emmanuel, Dupoux, Olivier Pietquin

TL;DR
This paper introduces Reward Calibration from Demonstration (RCfD), a novel method that uses human demonstrations and reward models to recalibrate reward functions, effectively mitigating reward over-optimization in large language models without extensive hyperparameter tuning.
Contribution
The paper proposes RCfD, a demonstration-guided reinforcement learning approach that recalibrates reward functions to prevent over-optimization, offering a more natural language generation and reducing reliance on KL regularization.
Findings
RCfD effectively mitigates reward over-optimization in LLMs.
RCfD achieves comparable performance to tuned baselines.
RCfD promotes diverse and natural language generation.
Abstract
While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO). Existing approaches address ROO by adding KL regularization, requiring computationally expensive hyperparameter tuning. Additionally, KL regularization focuses solely on regularizing the language policy, neglecting a potential source of regularization: the reward function itself. Inspired by demonstration-guided RL, we here introduce the Reward Calibration from Demonstration (RCfD), which leverages human demonstrations and a reward model to recalibrate the reward objective. Formally, given a prompt, the RCfD objective minimizes the distance between the demonstrations' and LLM's rewards rather than directly maximizing the reward function. This objective shift avoids incentivizing the LLM to exploit the reward model and promotes more natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScheduling and Optimization Algorithms · Reinforcement Learning in Robotics
