ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages
Andrew Jesson, Chris Lu, Gunshi Gupta, Nicolas Beltran-Velez, and Angelos Filos, Jakob Nicolaus Foerster, Yarin Gal

TL;DR
This paper enhances on-policy actor-critic reinforcement learning by integrating ReLU advantages, spectral normalization, and dropout-based Bayesian inference, leading to improved performance and exploration capabilities.
Contribution
It introduces a novel combination of ReLU advantages, spectral normalization, and dropout for Bayesian inference in on-policy actor-critic algorithms, with theoretical and empirical validation.
Findings
Significant performance improvements over A3C, PPO, SAC, and TD3 on MuJoCo benchmarks.
Enhanced generalization in ProcGen benchmark.
Theoretical justification for spectral normalization and Bayesian dropout in RL.
Abstract
This paper proposes a step toward approximate Bayesian inference in on-policy actor-critic deep reinforcement learning. It is implemented through three changes to the Asynchronous Advantage Actor-Critic (A3C) algorithm: (1) applying a ReLU function to advantage estimates, (2) spectral normalization of actor-critic weights, and (3) incorporating \emph{dropout as a Bayesian approximation}. We prove under standard assumptions that restricting policy updates to positive advantages optimizes for value by maximizing a lower bound on the value function plus an additive term. We show that the additive term is bounded proportional to the Lipschitz constant of the value function, which offers theoretical grounding for spectral normalization of critic weights. Finally, our application of dropout corresponds to approximate Bayesian inference over both the actor and critic parameters, which enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
MethodsClipped Double Q-learning · Target Policy Smoothing · Adam · Global Average Pooling · Experience Replay · Dropout · Dilated Convolution · Proximal Policy Optimization · 1x1 Convolution · Average Pooling
