All-Action Policy Gradient Methods: A Numerical Integration Approach

Benjamin Petit; Loren Amdahl-Culleton; Yao Liu; Jimmy Smith,; Pierre-Luc Bacon

arXiv:1910.09093·cs.LG·October 22, 2019

All-Action Policy Gradient Methods: A Numerical Integration Approach

Benjamin Petit, Loren Amdahl-Culleton, Yao Liu, Jimmy Smith,, Pierre-Luc Bacon

PDF

Open Access

TL;DR

This paper introduces a numerical integration approach to all-action policy gradient methods, expanding their applicability and reducing variance, leading to improved performance in continuous control tasks.

Contribution

It generalizes the all-action estimator to broader spaces and function classes, and provides new theoretical insights on biased critics in policy gradient methods.

Findings

01

Enhanced performance in continuous control tasks

02

Reduced variance of gradient estimates

03

Improved sample efficiency

Abstract

While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space. When this integral can be computed, the resulting "all-action" estimator [Sutton, 2001] provides a conditioning effect [Bratley, 1987] reducing the variance significantly compared to the REINFORCE estimator [Williams, 1992]. In this paper, we adopt a numerical integration perspective to broaden the applicability of the all-action estimator to general spaces and to any function class for the policy or critic components, beyond the Gaussian case considered by [Ciosek, 2018]. In addition, we provide a new theoretical result on the effect of using a biased critic which offers more guidance than the previous "compatible features" condition of [Sutton, 1999]. We demonstrate the benefit of our approach in continuous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · Optimization and Search Problems

MethodsREINFORCE