Efficient Preference-Based Reinforcement Learning Using Learned Dynamics   Models

Yi Liu; Gaurav Datta; Ellen Novoseller; Daniel S. Brown

arXiv:2301.04741·cs.LG·February 13, 2024·1 cites

Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models

Yi Liu, Gaurav Datta, Ellen Novoseller, Daniel S. Brown

PDF

Open Access

TL;DR

This paper demonstrates that learned dynamics models can make preference-based reinforcement learning more sample-efficient and safer by reducing environment interactions and enabling reward pre-training without additional environment use.

Contribution

It introduces a method leveraging learned dynamics models in PbRL to improve safety and efficiency, addressing limitations of prior model-free approaches.

Findings

01

Preference elicitation requires fewer environment interactions.

02

Diverse preference queries can be synthesized safely.

03

Reward pre-training can be done without environment interaction.

Abstract

Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a high-fidelity simulator or analytic model or take a model-free approach that requires extensive, possibly unsafe online environment interactions. In this paper, we study the benefits and challenges of using a learned dynamics model when performing PbRL. In particular, we provide evidence that a learned dynamics model offers the following benefits when performing PbRL: (1) preference elicitation and policy optimization require significantly fewer environment interactions than model-free PbRL, (2) diverse preference queries can be synthesized safely and efficiently as a byproduct of standard model-based RL, and (3) reward pre-training based on suboptimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Transportation and Mobility Innovations