ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive   Advantages

Andrew Jesson; Chris Lu; Gunshi Gupta; Nicolas Beltran-Velez; and Angelos Filos; Jakob Nicolaus Foerster; Yarin Gal

arXiv:2306.01460·cs.LG·October 11, 2024·2 cites

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

Andrew Jesson, Chris Lu, Gunshi Gupta, Nicolas Beltran-Velez, and Angelos Filos, Jakob Nicolaus Foerster, Yarin Gal

PDF

Open Access 1 Repo

TL;DR

This paper enhances on-policy actor-critic reinforcement learning by integrating ReLU advantages, spectral normalization, and dropout-based Bayesian inference, leading to improved performance and exploration capabilities.

Contribution

It introduces a novel combination of ReLU advantages, spectral normalization, and dropout for Bayesian inference in on-policy actor-critic algorithms, with theoretical and empirical validation.

Findings

01

Significant performance improvements over A3C, PPO, SAC, and TD3 on MuJoCo benchmarks.

02

Enhanced generalization in ProcGen benchmark.

03

Theoretical justification for spectral normalization and Bayesian dropout in RL.

Abstract

This paper proposes a step toward approximate Bayesian inference in on-policy actor-critic deep reinforcement learning. It is implemented through three changes to the Asynchronous Advantage Actor-Critic (A3C) algorithm: (1) applying a ReLU function to advantage estimates, (2) spectral normalization of actor-critic weights, and (3) incorporating \emph{dropout as a Bayesian approximation}. We prove under standard assumptions that restricting policy updates to positive advantages optimizes for value by maximizing a lower bound on the value function plus an additive term. We show that the additive term is bounded proportional to the Lipschitz constant of the value function, which offers theoretical grounding for spectral normalization of critic weights. Finally, our application of dropout corresponds to approximate Bayesian inference over both the actor and critic parameters, which enables…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anndvision/vsop
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms

MethodsClipped Double Q-learning · Target Policy Smoothing · Adam · Global Average Pooling · Experience Replay · Dropout · Dilated Convolution · Proximal Policy Optimization · 1x1 Convolution · Average Pooling