Sample Efficient Policy Gradient Methods with Recursive Variance   Reduction

Pan Xu; Felicia Gao; Quanquan Gu

arXiv:1909.08610·cs.LG·August 3, 2021·34 cites

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Pan Xu, Felicia Gao, Quanquan Gu

PDF

Open Access 1 Repo

TL;DR

This paper introduces SRVR-PG, a new policy gradient method that significantly reduces sample complexity in reinforcement learning by employing recursive variance reduction, and demonstrates its effectiveness through experiments.

Contribution

The paper proposes SRVR-PG, a novel policy gradient algorithm with improved sample complexity, and a variant with parameter exploration, advancing reinforcement learning efficiency.

Findings

01

SRVR-PG achieves $O(1/\epsilon^{3/2})$ sample complexity.

02

The variant with parameter exploration enhances initial policy sampling.

03

Numerical experiments validate the improved performance on control problems.

Abstract

Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires $O (1/ ϵ^{3/2})$ episodes to find an $ϵ$ -approximate stationary point of the nonconcave performance function $J (θ)$ (i.e., $θ$ such that $∥\nabla J (θ) ∥_{2}^{2} \leq ϵ$ ). This sample complexity improves the existing result $O (1/ ϵ^{5/3})$ for stochastic variance reduced policy gradient algorithms by a factor of $O (1/ ϵ^{1/6})$ . In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xgfelicia/SRVRPG
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques