Convergence and Optimality of Policy Gradient Methods in Weakly Smooth   Settings

Matthew S. Zhang; Murat A. Erdogdu; Animesh Garg

arXiv:2111.00185·cs.LG·April 8, 2022

Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings

Matthew S. Zhang, Murat A. Erdogdu, Animesh Garg

PDF

Open Access 1 Video

TL;DR

This paper establishes explicit convergence rates for policy gradient methods in weakly smooth settings, broadening their applicability and providing performance guarantees for the resulting policies.

Contribution

It extends convergence analysis of policy gradient methods to weakly smooth policy classes with $L_2$ integrable gradients, under more practical conditions.

Findings

01

Convergence rates are achieved for both standard and natural policy gradient algorithms.

02

The analysis applies to weakly smooth policy classes with $L_2$ integrable gradients.

03

Performance guarantees are provided for the converged policies.

Abstract

Policy gradient methods have been frequently applied to problems in control and reinforcement learning with great success, yet existing convergence analysis still relies on non-intuitive, impractical and often opaque conditions. In particular, existing rates are achieved in limited settings, under strict regularity conditions. In this work, we establish explicit convergence rates of policy gradient methods, extending the convergence regime to weakly smooth policy classes with $L_{2}$ integrable gradient. We provide intuitive examples to illustrate the insight behind these new conditions. Notably, our analysis also shows that convergence rates are achievable for both the standard policy gradient and the natural policy gradient algorithms under these assumptions. Lastly we provide performance guarantees for the converged policies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Adaptive Dynamic Programming Control