Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies
Rui Yuan, Simon S. Du, Robert M. Gower, Alessandro Lazaric, Lin Xiao

TL;DR
This paper proves that natural policy gradient methods with log-linear policies achieve linear convergence in infinite-horizon discounted MDPs, with favorable sample complexities, without needing entropy regularization.
Contribution
It establishes the first linear convergence rates for NPG and Q-NPG with log-linear policies using a simple step size scheme, avoiding entropy regularization.
Findings
Both NPG and Q-NPG attain linear convergence rates.
Sample complexity is $ ilde{O}(1/ ext{epsilon}^2)$ with a simple step size.
Sublinear convergence rates are also derived for constant step sizes.
Abstract
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as inexact versions of the policy mirror descent (PMD) method. We show that both methods attain linear convergence rates and sample complexities using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization. Lastly, as a byproduct, we obtain sublinear convergence rates for both methods with arbitrary constant step size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics
