The Difficulty of Passive Learning in Deep Reinforcement Learning

Georg Ostrovski; Pablo Samuel Castro; Will Dabney

arXiv:2110.14020·cs.LG·October 28, 2021·1 cites

The Difficulty of Passive Learning in Deep Reinforcement Learning

Georg Ostrovski, Pablo Samuel Castro, Will Dabney

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates the challenges of passive, offline deep reinforcement learning, highlighting the difficulties caused by function approximation and fixed data distributions through empirical analysis and a novel experimental paradigm.

Contribution

It introduces the tandem learning paradigm for empirical analysis and extends understanding of offline RL challenges beyond traditional tabular and linear settings.

Findings

01

Function approximation combined with fixed data distributions is a major challenge.

02

Empirical results challenge previous hypotheses about offline RL difficulties.

03

Insights are relevant for both offline and online deep reinforcement learning.

Abstract

Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL). Recent approaches involve constraints on the learned policy or conservative updates, preventing strong deviations from the state-action distribution of the dataset. Although these methods are evaluated using non-linear function approximation, theoretical justifications are mostly limited to the tabular or linear cases. Given the impressive results of deep reinforcement learning, we argue for a need to more clearly understand the challenges in this setting. In the vein of Held & Hein's classic 1963 experiment, we propose the "tandem learning" experimental paradigm which facilitates our empirical analysis of the difficulties in offline reinforcement learning. We identify function approximation in conjunction with fixed data distributions as the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepmind/deepmind-research
tfOfficial

Videos

The Difficulty of Passive Learning in Deep Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques