Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning
Noah Y. Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas, Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess,, Martin Riedmiller

TL;DR
This paper introduces a simple method using a learned prior called advantage-weighted behavior model (ABM) to improve off-policy reinforcement learning in batch settings, especially for continuous control and robotics.
Contribution
It extends batch RL by incorporating a learned prior that biases policies towards previously successful actions, enabling stable learning from diverse data sources.
Findings
Improved performance on continuous control benchmarks.
Effective multi-task learning for robots.
Stable learning from conflicting data sources.
Abstract
Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can be acquired. This property makes these algorithms appealing for real world problems such as robot control. In practice, however, standard off-policy algorithms fail in the batch setting for continuous control. In this paper, we propose a simple solution to this problem. It admits the use of data generated by arbitrary behavior policies and uses a learned prior -- the advantage-weighted behavior model (ABM) -- to bias the RL policy towards actions that have previously been executed and are likely to be successful on the new task. Our method can be seen as an extension of recent work on batch-RL that enables stable learning from conflicting data-sources. We find improvements on competitive baselines in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Smart Grid Energy Management
