The Mirage of Action-Dependent Baselines in Reinforcement Learning
George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner,, Zoubin Ghahramani, Sergey Levine

TL;DR
This paper critically examines the claimed benefits of action-dependent baselines in reinforcement learning, revealing they do not reduce variance as previously thought and identifying implementation nuances affecting empirical results.
Contribution
It provides a variance decomposition analysis showing learned state-action baselines do not outperform state-only baselines in common benchmarks, and clarifies implementation factors influencing observed gains.
Findings
State-action baselines do not reduce variance over state-only baselines in tested domains.
Implementation details significantly impact the empirical effectiveness of action-dependent baselines.
A simple modification to value function parameterization can improve performance.
Abstract
Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. To better understand this development, we decompose the variance of the policy gradient estimator and numerically show that learned state-action-dependent baselines do not in fact reduce variance over a state-dependent baseline in commonly tested benchmark domains. We confirm this unexpected result by reviewing the open-source code accompanying these prior papers, and show that subtle implementation decisions cause deviations from the methods presented in the papers and explain the source of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Neural dynamics and brain function
