A State Augmentation based approach to Reinforcement Learning from Human   Preferences

Mudit Verma; Subbarao Kambhampati

arXiv:2302.08734·cs.AI·February 20, 2023·1 cites

A State Augmentation based approach to Reinforcement Learning from Human Preferences

Mudit Verma, Subbarao Kambhampati

PDF

Open Access

TL;DR

This paper introduces a state augmentation technique for preference-based reinforcement learning that enhances reward robustness and improves early training performance across multiple domains.

Contribution

The proposed state augmentation method significantly improves reward recovery and early training performance in preference-based reinforcement learning.

Findings

01

Enhanced reward recovery compared to baseline PEBBLE

02

Improved early training performance across three domains

03

Method is effective in diverse tasks from simple to robotic manipulation

Abstract

Reinforcement Learning has suffered from poor reward specification, and issues for reward hacking even in simple enough domains. Preference Based Reinforcement Learning attempts to solve the issue by utilizing binary feedbacks on queried trajectory pairs by a human in the loop indicating their preferences about the agent's behavior to learn a reward model. In this work, we present a state augmentation technique that allows the agent's reward model to be robust and follow an invariance consistency that significantly improved performance, i.e. the reward recovery and subsequent return computed using the learned policy over our baseline PEBBLE. We validate our method on three domains, Mountain Car, a locomotion task of Quadruped-Walk, and a robotic manipulation task of Sweep-Into, and find that using the proposed augmentation the agent not only benefits in the overall performance but does…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Data Stream Mining Techniques