On the Global Optimum Convergence of Momentum-based Policy Gradient

Yuhao Ding; Junzi Zhang; Javad Lavaei

arXiv:2110.10116·cs.LG·May 24, 2022

On the Global Optimum Convergence of Momentum-based Policy Gradient

Yuhao Ding, Junzi Zhang, Javad Lavaei

PDF

Open Access

TL;DR

This paper proves the first global convergence results for momentum-based policy gradient methods in reinforcement learning, showing improved sample complexity and providing a framework for analyzing stochastic PG algorithms.

Contribution

It establishes the first global convergence guarantees for momentum-enhanced policy gradient methods, with improved sample complexity bounds for different policy parametrizations.

Findings

01

Momentum improves global optimality sample complexity.

02

First single-loop, finite-batch PG algorithm with $ ilde{O}( ext{epsilon}^{-3})$ complexity.

03

Framework applicable to various PG estimators.

Abstract

Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by studying the global convergence of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study both the soft-max and the Fisher-non-degenerate policy parametrizations, and show that adding a momentum improves the global optimality sample complexity of vanilla PG methods by $\tilde{O} (ϵ^{- 1.5})$ and $\tilde{O} (ϵ^{- 1})$ , respectively, where $ϵ > 0$ is the target tolerance. Our work is the first one that obtains global convergence results for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Machine Learning and ELM