Iterative Reward Shaping using Human Feedback for Correcting Reward Misspecification
Jasmina Gajcin, James McCarthy, Rahul Nair, Radu Marinescu, Elizabeth, Daly, Ivana Dusparic

TL;DR
This paper introduces ITERS, an iterative reward shaping method that uses human feedback and explanations to correct misspecified reward functions in reinforcement learning, improving training outcomes.
Contribution
The paper presents a novel iterative reward shaping approach that incorporates human trajectory feedback and explanations to mitigate reward misspecification in RL training.
Findings
Successfully corrects misspecified reward functions in multiple environments
Reduces user effort by leveraging explanations alongside feedback
Improves RL training effectiveness through iterative reward adjustments
Abstract
A well-defined reward function is crucial for successful training of an reinforcement learning (RL) agent. However, defining a suitable reward function is a notoriously challenging task, especially in complex, multi-objective environments. Developers often have to resort to starting with an initial, potentially misspecified reward function, and iteratively adjusting its parameters, based on observed learned behavior. In this work, we aim to automate this process by proposing ITERS, an iterative reward shaping approach using human feedback for mitigating the effects of a misspecified reward function. Our approach allows the user to provide trajectory-level feedback on agent's behavior during training, which can be integrated as a reward shaping signal in the following training iteration. We also allow the user to provide explanations of their feedback, which are used to augment the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Reinforcement Learning in Robotics · EEG and Brain-Computer Interfaces
