Convergence and Sample Complexity of Policy Gradient Methods for Stabilizing Linear Systems
Feiran Zhao, Xingyun Fu, Keyou You

TL;DR
This paper analyzes the convergence and sample complexity of policy gradient methods for stabilizing linear systems, introducing an adaptive discount factor rule and providing theoretical guarantees with practical validation.
Contribution
It presents a novel adaptive discount factor rule based on stability margins and establishes the sample complexity bounds for policy gradient stabilization of linear systems.
Findings
Sample complexity is logarithmic in the spectral radius of the system matrix.
The adaptive discount rule improves convergence in stabilizing policies.
Simulations confirm theoretical results and effectiveness on nonlinear systems.
Abstract
System stabilization via policy gradient (PG) methods has drawn increasing attention in both control and machine learning communities. In this paper, we study their convergence and sample complexity for stabilizing linear time-invariant systems in terms of the number of system rollouts. Our analysis is built upon a discounted linear quadratic regulator (LQR) method which alternatively updates the policy and the discount factor of the LQR problem. Firstly, we propose an explicit rule to adaptively adjust the discount factor by exploring the stability margin of a linear control policy. Then, we establish the sample complexity of PG methods for stabilization, which only adds a coefficient logarithmic in the spectral radius of the state matrix to that for solving the LQR problem with a prior stabilizing policy. Finally, we perform simulations to validate our theoretical findings and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Advanced Memory and Neural Computing · Adaptive Dynamic Programming Control
