Convergence Guarantees of Model-free Policy Gradient Methods for LQR with Stochastic Data
Bowen Song, Andrea Iannelli

TL;DR
This paper analyzes the convergence of model-free policy gradient methods for stochastic LQR problems, providing theoretical guarantees and exploring techniques to improve robustness and efficiency.
Contribution
It offers the first convergence analysis of model-free PG methods for stochastic LQR, including adaptive step sizes and variance reduction techniques.
Findings
Global convergence guarantees are established for various PG algorithms.
Gradient estimation errors are characterized and linked to convergence behavior.
Adaptive step sizes and variance reduction improve convergence rate and sample efficiency.
Abstract
Policy gradient (PG) methods are the backbone of many reinforcement learning algorithms due to their good performance in policy optimization problems. As a gradient-based approach, PG methods typically rely on knowledge of the system dynamics. If this is not available, trajectory data can be utilized to approximate first-order information. When the data are noisy, gradient estimates become inaccurate and a study that investigates uncertainty estimation and the analysis of its propagation through the algorithm is currently missing. To address this, our work focuses on the Linear Quadratic Regulator (LQR) problem for systems subject to additive stochastic noise. After briefly summarizing the state of the art for cases with a known model, we focus on scenarios where the system dynamics are unknown, and approximate gradient information is obtained using zeroth-order optimization techniques.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
