DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization
Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George, Tucker, Sergey Levine

TL;DR
This paper reveals that implicit regularization in deep reinforcement learning can be harmful, leading to degenerate representations, and proposes an explicit regularizer, DR3, to improve learning stability and performance.
Contribution
The paper identifies the negative impact of implicit regularization in offline deep RL and introduces DR3, an explicit regularizer that enhances stability and performance.
Findings
DR3 improves stability in offline RL tasks
Explicit regularization counteracts degenerate feature representations
Enhanced performance observed in Atari, D4RL, and robotic manipulation
Abstract
Despite overparameterization, deep networks trained via supervised learning are easy to optimize and exhibit excellent generalization. One hypothesis to explain this is that overparameterized deep networks enjoy the benefits of implicit regularization induced by stochastic gradient descent, which favors parsimonious solutions that generalize well on test inputs. It is reasonable to surmise that deep reinforcement learning (RL) methods could also benefit from this effect. In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations. Our theoretical analysis shows that when existing models of implicit regularization are applied to temporal difference learning, the resulting derived regularizer favors degenerate solutions with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
MethodsStochastic Gradient Descent
