All-Action Policy Gradient Methods: A Numerical Integration Approach
Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith,, Pierre-Luc Bacon

TL;DR
This paper introduces a numerical integration approach to all-action policy gradient methods, expanding their applicability and reducing variance, leading to improved performance in continuous control tasks.
Contribution
It generalizes the all-action estimator to broader spaces and function classes, and provides new theoretical insights on biased critics in policy gradient methods.
Findings
Enhanced performance in continuous control tasks
Reduced variance of gradient estimates
Improved sample efficiency
Abstract
While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space. When this integral can be computed, the resulting "all-action" estimator [Sutton, 2001] provides a conditioning effect [Bratley, 1987] reducing the variance significantly compared to the REINFORCE estimator [Williams, 1992]. In this paper, we adopt a numerical integration perspective to broaden the applicability of the all-action estimator to general spaces and to any function class for the policy or critic components, beyond the Gaussian case considered by [Ciosek, 2018]. In addition, we provide a new theoretical result on the effect of using a biased critic which offers more guidance than the previous "compatible features" condition of [Sutton, 1999]. We demonstrate the benefit of our approach in continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · Optimization and Search Problems
MethodsREINFORCE
