Geometry and convergence of natural policy gradient methods

Johannes M\"uller; Guido Mont\'ufar

arXiv:2211.02105·math.OC·February 21, 2024·1 cites

Geometry and convergence of natural policy gradient methods

Johannes M\"uller, Guido Mont\'ufar

PDF

Open Access

TL;DR

This paper analyzes the convergence properties of natural policy gradient methods in reinforcement learning, revealing their geometric structure and providing guarantees for various regularizations and geometries.

Contribution

It introduces a geometric framework for understanding NPG convergence, deriving global and local rates, and connecting discrete NPG to inexact Newton methods.

Findings

01

Global convergence guarantees for several NPG variants.

02

Linear convergence rates for regularized NPG flows.

03

Local quadratic convergence for regularized NPG as inexact Newton methods.

Abstract

We study the convergence of several natural policy gradient (NPG) methods in infinite-horizon discounted Markov decision processes with regular policy parametrizations. For a variety of NPGs and reward functions we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Kakade and Morimura and co-authors by observing that these arise from the Hessian geometries of conditional entropy and entropy respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like log-barriers. Finally, we interpret the discrete-time NPG methods with regularized rewards as inexact Newton methods if…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques