Homotopic Policy Mirror Descent: Policy Convergence, Implicit   Regularization, and Improved Sample Complexity

Yan Li; Guanghui Lan; Tuo Zhao

arXiv:2201.09457·cs.LG·November 30, 2022·1 cites

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity

Yan Li, Guanghui Lan, Tuo Zhao

PDF

Open Access

TL;DR

This paper introduces homotopic policy mirror descent (HPMD), a new policy gradient method with strong convergence guarantees and improved sample complexity for solving discounted MDPs, extending to stochastic settings and various divergence measures.

Contribution

The paper presents HPMD with global and local convergence guarantees, certifies the limiting policy as optimal with maximal entropy, and extends results to stochastic versions and diverse divergence functions.

Findings

01

Global linear convergence of HPMD with KL divergence.

02

Local superlinear convergence without assumptions.

03

Improved sample complexity under generative model.

Abstract

We propose a new policy gradient method, named homotopic policy mirror descent (HPMD), for solving discounted, infinite horizon MDPs with finite state and action spaces. HPMD performs a mirror descent type policy update with an additional diminishing regularization term, and possesses several computational properties that seem to be new in the literature. We first establish the global linear convergence of HPMD instantiated with Kullback-Leibler divergence, for both the optimality gap, and a weighted distance to the set of optimal policies. Then local superlinear convergence is obtained for both quantities without any assumption. With local acceleration and diminishing regularization, we establish the first result among policy gradient methods on certifying and characterizing the limiting policy, by showing, with a non-asymptotic characterization, that the last-iterate policy converges…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning