An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods
Yanli Liu, Kaiqing Zhang, Tamer Ba\c{s}ar, Wotao Yin

TL;DR
This paper enhances the theoretical understanding of policy gradient methods, demonstrating their convergence properties and introducing a variance-reduced natural policy gradient algorithm with improved sample complexity and global convergence guarantees.
Contribution
It provides new convergence results for PG and NPG methods, and introduces SRVR-NPG, a variance-reduced NPG algorithm with global convergence and finite-sample efficiency.
Findings
Variance-reduced PG converges to the global optimum up to approximation error.
NPG has lower sample complexity than PG.
SRVR-NPG achieves global convergence with variance reduction.
Abstract
In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which has only been shown to converge to stationary points, converges to the globally optimal value up to some inherent function approximation error due to policy parametrization; ii) we show that NPG enjoys a lower sample complexity; iii) we propose SRVR-NPG, which incorporates variance-reduction into the NPG update. Our improvements follow from an observation that the convergence of (variance-reduced) PG and NPG methods can improve each other: the stationary convergence analysis of PG can be applied to NPG as well, and the global convergence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Reinforcement Learning in Robotics
