On the Optimization Landscape of Dynamic Output Feedback Linear Quadratic Control
Jingliang Duan, Wenhan Cao, Yang Zheng, Lin Zhao

TL;DR
This paper analyzes the complex optimization landscape of dynamic output-feedback linear quadratic control (dLQR), providing theoretical insights into policy gradient methods and establishing conditions for their optimality and equivalence with LQG control.
Contribution
It characterizes the optimization landscape of dLQR, proves the uniqueness of stationary points under observability, and links dLQR with LQG control for stochastic systems.
Findings
Uniqueness of stationary points for observable dLQR.
Conditions under which dLQR and LQG are equivalent.
Insights into policy gradient algorithm design for partially observed systems.
Abstract
The convergence of policy gradient algorithms hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investigate the more challenging case of dynamic output-feedback policies for linear quadratic regulation (abbreviated as dLQR), which is prevalent in practice but has a rather complicated optimization landscape. We first show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller. One of our core results is the uniqueness of the stationary point of dLQR when it is observable, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · Advanced Bandit Algorithms Research
