The Difficulty of Passive Learning in Deep Reinforcement Learning
Georg Ostrovski, Pablo Samuel Castro, Will Dabney

TL;DR
This paper investigates the challenges of passive, offline deep reinforcement learning, highlighting the difficulties caused by function approximation and fixed data distributions through empirical analysis and a novel experimental paradigm.
Contribution
It introduces the tandem learning paradigm for empirical analysis and extends understanding of offline RL challenges beyond traditional tabular and linear settings.
Findings
Function approximation combined with fixed data distributions is a major challenge.
Empirical results challenge previous hypotheses about offline RL difficulties.
Insights are relevant for both offline and online deep reinforcement learning.
Abstract
Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL). Recent approaches involve constraints on the learned policy or conservative updates, preventing strong deviations from the state-action distribution of the dataset. Although these methods are evaluated using non-linear function approximation, theoretical justifications are mostly limited to the tabular or linear cases. Given the impressive results of deep reinforcement learning, we argue for a need to more clearly understand the challenges in this setting. In the vein of Held & Hein's classic 1963 experiment, we propose the "tandem learning" experimental paradigm which facilitates our empirical analysis of the difficulties in offline reinforcement learning. We identify function approximation in conjunction with fixed data distributions as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques
