PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method   with Probabilistic Gradient Estimation

Matilde Gargiani; Andrea Zanelli; Andrea Martinelli; Tyler Summers,; John Lygeros

arXiv:2202.00308·cs.LG·February 2, 2022·1 cites

PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation

Matilde Gargiani, Andrea Zanelli, Andrea Martinelli, Tyler Summers,, John Lygeros

PDF

Open Access

TL;DR

PAGE-PG introduces a novel loopless variance-reduced policy gradient method using probabilistic gradient estimation, achieving optimal sample complexity and demonstrating competitive performance on control tasks.

Contribution

It proposes PAGE-PG, a new loopless variance-reduced policy gradient algorithm that uses probabilistic switching and importance sampling for unbiased gradient estimation.

Findings

01

Achieves $ ilde{O}(rac{1}{ ext{epsilon}^3})$ sample complexity for $ ext{epsilon}$-stationary solutions.

02

Matches the sample complexity of leading methods under similar conditions.

03

Shows competitive numerical performance on classical control tasks.

Abstract

Despite their success, policy gradient methods suffer from high variance of the gradient estimate, which can result in unsatisfactory sample complexity. Recently, numerous variance-reduced extensions of policy gradient methods with provably better sample complexity and competitive numerical performance have been proposed. After a compact survey on some of the main variance-reduced REINFORCE-type methods, we propose ProbAbilistic Gradient Estimation for Policy Gradient (PAGE-PG), a novel loopless variance-reduced policy gradient method based on a probabilistic switch between two types of updates. Our method is inspired by the PAGE estimator for supervised learning and leverages importance sampling to obtain an unbiased gradient estimator. We show that PAGE-PG enjoys a $O (ϵ^{- 3})$ average sample complexity to reach an $ϵ$ -stationary solution, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning