Sample Efficient Policy Gradient Methods with Recursive Variance Reduction
Pan Xu, Felicia Gao, Quanquan Gu

TL;DR
This paper introduces SRVR-PG, a new policy gradient method that significantly reduces sample complexity in reinforcement learning by employing recursive variance reduction, and demonstrates its effectiveness through experiments.
Contribution
The paper proposes SRVR-PG, a novel policy gradient algorithm with improved sample complexity, and a variant with parameter exploration, advancing reinforcement learning efficiency.
Findings
SRVR-PG achieves $O(1/\epsilon^{3/2})$ sample complexity.
The variant with parameter exploration enhances initial policy sampling.
Numerical experiments validate the improved performance on control problems.
Abstract
Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires episodes to find an -approximate stationary point of the nonconcave performance function (i.e., such that ). This sample complexity improves the existing result for stochastic variance reduced policy gradient algorithms by a factor of . In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
