Globally Convergent Policy Gradient Methods for Linear Quadratic Control of Partially Observed Systems
Feiran Zhao, Xingyun Fu, Keyou You

TL;DR
This paper introduces a new policy parameterization for partially observed linear systems, proving global convergence of policy gradient methods and revealing the relation between initial policies and solutions.
Contribution
It proposes a novel parameterization using finite-length input-output trajectories and establishes global convergence guarantees for policy gradient methods.
Findings
Gradient dominance property ensures convergence
Solution set is invariant to similarity transformations
Simulations validate theoretical results
Abstract
While the optimization landscape of policy gradient methods has been recently investigated for partially observed linear systems in terms of both static output feedback and dynamical controllers, they only provide convergence guarantees to stationary points. In this paper, we propose a new policy parameterization for partially observed linear systems, using a past input-output trajectory of finite length as feedback. We show that the solution set to the parameterized optimization problem is a matrix space, which is invariant to similarity transformation. By proving a gradient dominance property, we show the global convergence of policy gradient methods. Moreover, we observe that the gradient is orthogonal to the solution set, revealing an explicit relation between the resulting solution and the initial policy. Finally, we perform simulations to validate our theoretical results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Control and Stability of Dynamical Systems · Fuel Cells and Related Materials
