DR3: Value-Based Deep Reinforcement Learning Requires Explicit   Regularization

Aviral Kumar; Rishabh Agarwal; Tengyu Ma; Aaron Courville; George; Tucker; Sergey Levine

arXiv:2112.04716·cs.LG·December 10, 2021·6 cites

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George, Tucker, Sergey Levine

PDF

Open Access 1 Video

TL;DR

This paper reveals that implicit regularization in deep reinforcement learning can be harmful, leading to degenerate representations, and proposes an explicit regularizer, DR3, to improve learning stability and performance.

Contribution

The paper identifies the negative impact of implicit regularization in offline deep RL and introduces DR3, an explicit regularizer that enhances stability and performance.

Findings

01

DR3 improves stability in offline RL tasks

02

Explicit regularization counteracts degenerate feature representations

03

Enhanced performance observed in Atari, D4RL, and robotic manipulation

Abstract

Despite overparameterization, deep networks trained via supervised learning are easy to optimize and exhibit excellent generalization. One hypothesis to explain this is that overparameterized deep networks enjoy the benefits of implicit regularization induced by stochastic gradient descent, which favors parsimonious solutions that generalize well on test inputs. It is reasonable to surmise that deep reinforcement learning (RL) methods could also benefit from this effect. In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations. Our theoretical analysis shows that when existing models of implicit regularization are applied to temporal difference learning, the resulting derived regularizer favors degenerate solutions with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsStochastic Gradient Descent