Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings
Matthew S. Zhang, Murat A. Erdogdu, Animesh Garg

TL;DR
This paper establishes explicit convergence rates for policy gradient methods in weakly smooth settings, broadening their applicability and providing performance guarantees for the resulting policies.
Contribution
It extends convergence analysis of policy gradient methods to weakly smooth policy classes with $L_2$ integrable gradients, under more practical conditions.
Findings
Convergence rates are achieved for both standard and natural policy gradient algorithms.
The analysis applies to weakly smooth policy classes with $L_2$ integrable gradients.
Performance guarantees are provided for the converged policies.
Abstract
Policy gradient methods have been frequently applied to problems in control and reinforcement learning with great success, yet existing convergence analysis still relies on non-intuitive, impractical and often opaque conditions. In particular, existing rates are achieved in limited settings, under strict regularity conditions. In this work, we establish explicit convergence rates of policy gradient methods, extending the convergence regime to weakly smooth policy classes with integrable gradient. We provide intuitive examples to illustrate the insight behind these new conditions. Notably, our analysis also shows that convergence rates are achievable for both the standard policy gradient and the natural policy gradient algorithms under these assumptions. Lastly we provide performance guarantees for the converged policies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Adaptive Dynamic Programming Control
