Linear Convergence of Natural Policy Gradient Methods with Log-Linear   Policies

Rui Yuan; Simon S. Du; Robert M. Gower; Alessandro Lazaric; Lin Xiao

arXiv:2210.01400·cs.LG·February 22, 2023·1 cites

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

Rui Yuan, Simon S. Du, Robert M. Gower, Alessandro Lazaric, Lin Xiao

PDF

Open Access

TL;DR

This paper proves that natural policy gradient methods with log-linear policies achieve linear convergence in infinite-horizon discounted MDPs, with favorable sample complexities, without needing entropy regularization.

Contribution

It establishes the first linear convergence rates for NPG and Q-NPG with log-linear policies using a simple step size scheme, avoiding entropy regularization.

Findings

01

Both NPG and Q-NPG attain linear convergence rates.

02

Sample complexity is $ ilde{O}(1/ ext{epsilon}^2)$ with a simple step size.

03

Sublinear convergence rates are also derived for constant step sizes.

Abstract

We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as inexact versions of the policy mirror descent (PMD) method. We show that both methods attain linear convergence rates and $\tilde{O} (1/ ϵ^{2})$ sample complexities using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization. Lastly, as a byproduct, we obtain sublinear convergence rates for both methods with arbitrary constant step size.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics