Iterative Reward Shaping using Human Feedback for Correcting Reward   Misspecification

Jasmina Gajcin; James McCarthy; Rahul Nair; Radu Marinescu; Elizabeth; Daly; Ivana Dusparic

arXiv:2308.15969·cs.AI·August 31, 2023

Iterative Reward Shaping using Human Feedback for Correcting Reward Misspecification

Jasmina Gajcin, James McCarthy, Rahul Nair, Radu Marinescu, Elizabeth, Daly, Ivana Dusparic

PDF

Open Access 1 Repo

TL;DR

This paper introduces ITERS, an iterative reward shaping method that uses human feedback and explanations to correct misspecified reward functions in reinforcement learning, improving training outcomes.

Contribution

The paper presents a novel iterative reward shaping approach that incorporates human trajectory feedback and explanations to mitigate reward misspecification in RL training.

Findings

01

Successfully corrects misspecified reward functions in multiple environments

02

Reduces user effort by leveraging explanations alongside feedback

03

Improves RL training effectiveness through iterative reward adjustments

Abstract

A well-defined reward function is crucial for successful training of an reinforcement learning (RL) agent. However, defining a suitable reward function is a notoriously challenging task, especially in complex, multi-objective environments. Developers often have to resort to starting with an initial, potentially misspecified reward function, and iteratively adjusting its parameters, based on observed learned behavior. In this work, we aim to automate this process by proposing ITERS, an iterative reward shaping approach using human feedback for mitigating the effects of a misspecified reward function. Our approach allows the user to provide trajectory-level feedback on agent's behavior during training, which can be integrated as a reward shaping signal in the following training iteration. We also allow the user to provide explanations of their feedback, which are used to augment the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anonymous902109/iters
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Reinforcement Learning in Robotics · EEG and Brain-Computer Interfaces