On the Global Convergence Rates of Softmax Policy Gradient Methods

Jincheng Mei; Chenjun Xiao; Csaba Szepesvari; Dale Schuurmans

arXiv:2005.06392·cs.LG·June 3, 2022·27 cites

On the Global Convergence Rates of Softmax Policy Gradient Methods

Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

PDF

Open Access 1 Video

TL;DR

This paper provides a theoretical analysis of the convergence rates of softmax policy gradient methods, showing that entropy regularization accelerates convergence from sublinear to linear, with implications for policy optimization.

Contribution

It establishes the first $O(1/t)$ convergence rate with true gradient and demonstrates that entropy regularization achieves a faster linear rate, explaining its empirical benefits.

Findings

01

Policy gradient with true gradient converges at $O(1/t)$ rate.

02

Entropy regularization leads to linear convergence rate.

03

Entropy improves policy optimization efficiency.

Abstract

We make three contributions toward better understanding policy gradient methods in the tabular setting. First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O (1/ t)$ rate, with constants depending on the problem and initialization. This result significantly expands the recent asymptotic convergence results. The analysis relies on two findings: that the softmax policy gradient satisfies a \L{}ojasiewicz inequality, and the minimum probability of an optimal action during optimization can be bounded in terms of its initial value. Second, we analyze entropy regularized policy gradient and show that it enjoys a significantly faster linear convergence rate $O (e^{- c \cdot t})$ toward softmax optimal policy $(c > 0)$ . This result resolves an open question in the recent literature. Finally, combining the above two results and additional new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Global Convergence Rates of Softmax Policy Gradient Methods· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research

MethodsEntropy Regularization · Softmax