Data Driven Reward Initialization for Preference based Reinforcement   Learning

Mudit Verma; Subbarao Kambhampati

arXiv:2302.08733·cs.LG·February 20, 2023

Data Driven Reward Initialization for Preference based Reinforcement Learning

Mudit Verma, Subbarao Kambhampati

PDF

Open Access

TL;DR

This paper introduces a data-driven reward initialization approach for Preference-based Reinforcement Learning that reduces variability and improves performance without additional human effort.

Contribution

The work proposes a novel reward initialization method that ensures uniform reward predictions, decreasing variability and enhancing PbRL performance across different runs.

Findings

01

Reduces reward model variability across runs

02

Improves overall PbRL performance

03

Maintains low human effort and computational cost

Abstract

Preference-based Reinforcement Learning (PbRL) methods utilize binary feedback from the human in the loop (HiL) over queried trajectory pairs to learn a reward model in an attempt to approximate the human's underlying reward function capturing their preferences. In this work, we investigate the issue of a high degree of variability in the initialized reward models which are sensitive to random seeds of the experiment. This further compounds the issue of degenerate reward functions PbRL methods already suffer from. We propose a data-driven reward initialization method that does not add any additional cost to the human in the loop and negligible cost to the PbRL agent and show that doing so ensures that the predicted rewards of the initialized reward model are uniform in the state space and this reduces the variability in the performance of the method across multiple runs and is shown to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Software Engineering Methodologies