Keep Doing What Worked: Behavioral Modelling Priors for Offline   Reinforcement Learning

Noah Y. Siegel; Jost Tobias Springenberg; Felix Berkenkamp; Abbas; Abdolmaleki; Michael Neunert; Thomas Lampe; Roland Hafner; Nicolas Heess,; Martin Riedmiller

arXiv:2002.08396·cs.LG·June 18, 2020·48 cites

Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

Noah Y. Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas, Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess,, Martin Riedmiller

PDF

Open Access

TL;DR

This paper introduces a simple method using a learned prior called advantage-weighted behavior model (ABM) to improve off-policy reinforcement learning in batch settings, especially for continuous control and robotics.

Contribution

It extends batch RL by incorporating a learned prior that biases policies towards previously successful actions, enabling stable learning from diverse data sources.

Findings

01

Improved performance on continuous control benchmarks.

02

Effective multi-task learning for robots.

03

Stable learning from conflicting data sources.

Abstract

Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can be acquired. This property makes these algorithms appealing for real world problems such as robot control. In practice, however, standard off-policy algorithms fail in the batch setting for continuous control. In this paper, we propose a simple solution to this problem. It admits the use of data generated by arbitrary behavior policies and uses a learned prior -- the advantage-weighted behavior model (ABM) -- to bias the RL policy towards actions that have previously been executed and are likely to be successful on the new task. Our method can be seen as an extension of recent work on batch-RL that enables stable learning from conflicting data-sources. We find improvements on competitive baselines in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Smart Grid Energy Management