An Empirical Study of Implicit Regularization in Deep Offline RL

Caglar Gulcehre; Srivatsan Srinivasan; Jakub Sygnowski; Georg; Ostrovski; Mehrdad Farajtabar; Matt Hoffman; Razvan Pascanu; Arnaud Doucet

arXiv:2207.02099·cs.LG·July 8, 2022

An Empirical Study of Implicit Regularization in Deep Offline RL

Caglar Gulcehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg, Ostrovski, Mehrdad Farajtabar, Matt Hoffman, Razvan Pascanu, Arnaud Doucet

PDF

Open Access

TL;DR

This paper empirically investigates the relationship between implicit regularization, specifically effective rank collapse, and performance in deep offline reinforcement learning, revealing that the association is complex and context-dependent.

Contribution

It provides a detailed empirical analysis showing that the link between effective rank and performance is not straightforward and highlights the influence of various factors on this relationship.

Findings

01

Effective rank collapse is only associated with performance in restricted settings.

02

Three phases of learning explain the impact of implicit regularization.

03

Bootstrapping alone does not cause effective rank collapse.

Abstract

Deep neural networks are the most commonly used function approximators in offline reinforcement learning. Prior works have shown that neural nets trained with TD-learning and gradient descent can exhibit implicit regularization that can be characterized by under-parameterization of these networks. Specifically, the rank of the penultimate feature layer, also called \textit{effective rank}, has been observed to drastically collapse during the training. In turn, this collapse has been argued to reduce the model's ability to further adapt in later stages of learning, leading to the diminished final performance. Such an association between the effective rank and performance makes effective rank compelling for offline RL, primarily for offline policy evaluation. In this work, we conduct a careful empirical study on the relation between effective rank and performance on three offline RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and ELM · Stochastic Gradient Optimization Techniques