PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation
Matilde Gargiani, Andrea Zanelli, Andrea Martinelli, Tyler Summers,, John Lygeros

TL;DR
PAGE-PG introduces a novel loopless variance-reduced policy gradient method using probabilistic gradient estimation, achieving optimal sample complexity and demonstrating competitive performance on control tasks.
Contribution
It proposes PAGE-PG, a new loopless variance-reduced policy gradient algorithm that uses probabilistic switching and importance sampling for unbiased gradient estimation.
Findings
Achieves $ ilde{O}(rac{1}{ ext{epsilon}^3})$ sample complexity for $ ext{epsilon}$-stationary solutions.
Matches the sample complexity of leading methods under similar conditions.
Shows competitive numerical performance on classical control tasks.
Abstract
Despite their success, policy gradient methods suffer from high variance of the gradient estimate, which can result in unsatisfactory sample complexity. Recently, numerous variance-reduced extensions of policy gradient methods with provably better sample complexity and competitive numerical performance have been proposed. After a compact survey on some of the main variance-reduced REINFORCE-type methods, we propose ProbAbilistic Gradient Estimation for Policy Gradient (PAGE-PG), a novel loopless variance-reduced policy gradient method based on a probabilistic switch between two types of updates. Our method is inspired by the PAGE estimator for supervised learning and leverages importance sampling to obtain an unbiased gradient estimator. We show that PAGE-PG enjoys a average sample complexity to reach an -stationary solution, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
