REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback
Souradip Chakraborty, Anukriti Singh, Amisha Bhaskar, Pratap Tokekar,, Dinesh Manocha, and Amrit Singh Bedi

TL;DR
This paper introduces REBEL, a reward regularization method for robotic reinforcement learning from human feedback, addressing reward misalignment and distribution shift to improve policy learning in continuous control tasks.
Contribution
It proposes a novel reward regularization framework that incorporates both human and agent preferences, providing a tractable algorithm with theoretical guarantees for better alignment.
Findings
The REBEL algorithm effectively mitigates distribution shift in RLHF.
REBEL outperforms existing methods on DeepMind Control Suite benchmarks.
The approach offers a theoretically justified solution to reward misalignment in robotic RL.
Abstract
The effectiveness of reinforcement learning (RL) agents in continuous control robotics tasks is mainly dependent on the design of the underlying reward function, which is highly prone to reward hacking. A misalignment between the reward function and underlying human preferences (values, social norms) can lead to catastrophic outcomes in the real world especially in the context of robotics for critical decision making. Recent methods aim to mitigate misalignment by learning reward functions from human preferences and subsequently performing policy optimization. However, these methods inadvertently introduce a distribution shift during reward learning due to ignoring the dependence of agent-generated trajectories on the reward learning objective, ultimately resulting in sub-optimal alignment. Hence, in this work, we address this challenge by advocating for the adoption of regularized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural and Behavioral Psychology Studies · Mental Health Research Topics
MethodsALIGN
