Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models
Yi Liu, Gaurav Datta, Ellen Novoseller, Daniel S. Brown

TL;DR
This paper demonstrates that learned dynamics models can make preference-based reinforcement learning more sample-efficient and safer by reducing environment interactions and enabling reward pre-training without additional environment use.
Contribution
It introduces a method leveraging learned dynamics models in PbRL to improve safety and efficiency, addressing limitations of prior model-free approaches.
Findings
Preference elicitation requires fewer environment interactions.
Diverse preference queries can be synthesized safely.
Reward pre-training can be done without environment interaction.
Abstract
Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a high-fidelity simulator or analytic model or take a model-free approach that requires extensive, possibly unsafe online environment interactions. In this paper, we study the benefits and challenges of using a learned dynamics model when performing PbRL. In particular, we provide evidence that a learned dynamics model offers the following benefits when performing PbRL: (1) preference elicitation and policy optimization require significantly fewer environment interactions than model-free PbRL, (2) diverse preference queries can be synthesized safely and efficiently as a byproduct of standard model-based RL, and (3) reward pre-training based on suboptimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Transportation and Mobility Innovations
