On the Global Optimum Convergence of Momentum-based Policy Gradient
Yuhao Ding, Junzi Zhang, Javad Lavaei

TL;DR
This paper proves the first global convergence results for momentum-based policy gradient methods in reinforcement learning, showing improved sample complexity and providing a framework for analyzing stochastic PG algorithms.
Contribution
It establishes the first global convergence guarantees for momentum-enhanced policy gradient methods, with improved sample complexity bounds for different policy parametrizations.
Findings
Momentum improves global optimality sample complexity.
First single-loop, finite-batch PG algorithm with $ ilde{O}( ext{epsilon}^{-3})$ complexity.
Framework applicable to various PG estimators.
Abstract
Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by studying the global convergence of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study both the soft-max and the Fisher-non-degenerate policy parametrizations, and show that adding a momentum improves the global optimality sample complexity of vanilla PG methods by and , respectively, where is the target tolerance. Our work is the first one that obtains global convergence results for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Machine Learning and ELM
