Sample-Efficient Preference-based Reinforcement Learning with Dynamics   Aware Rewards

Katherine Metcalf; Miguel Sarabia; Natalie Mackraz; Barry-John; Theobald

arXiv:2402.17975·cs.AI·February 29, 2024·1 cites

Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

Katherine Metcalf, Miguel Sarabia, Natalie Mackraz, Barry-John, Theobald

PDF

Open Access 1 Repo

TL;DR

This paper introduces a dynamics-aware reward learning method for preference-based reinforcement learning, significantly improving sample efficiency and policy performance with fewer human preferences.

Contribution

It proposes a novel approach that combines self-supervised dynamics-aware representations with preference-based reward learning, enhancing efficiency and effectiveness.

Findings

01

Achieves comparable performance with 10x fewer preference labels.

02

Improves final policy performance over existing methods.

03

Demonstrates the effectiveness on quadruped and cheetah environments.

Abstract

Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors. We show that dynamics-aware reward functions improve the sample efficiency of PbRL by an order of magnitude. In our experiments we iterate between: (1) learning a dynamics-aware state-action representation (z^{sa}) via a self-supervised temporal consistency task, and (2) bootstrapping the preference-based reward function from (z^{sa}), which results in faster policy learning and better final policy performance. For example, on quadruped-walk, walker-walk, and cheetah-run, with 50 preference labels we achieve the same performance as existing approaches with 500 preference labels, and we recover 83\% and 66\% of ground truth reward policy performance versus only 38\% and 21\%. The performance gains demonstrate the benefits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apple/ml-reed
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics