Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence
Kaiqing Zhang, Bin Hu, Tamer Ba\c{s}ar

TL;DR
This paper analyzes policy optimization methods for $\\mathcal{H}_2$ linear control with $\\mathcal{H}_\infty$ robustness, demonstrating their implicit regularization and global convergence despite nonconvexity and lack of coercivity.
Contribution
It establishes the convergence of policy optimization algorithms for $\\mathcal{H}_2$ control with $\\mathcal{H}_\infty$ robustness, highlighting implicit regularization and overcoming nonconvex challenges.
Findings
Algorithms preserve $\\mathcal{H}_\infty$ constraints via implicit regularization.
Global convergence to optimal policies with sublinear rates.
Potential for super-linear convergence under certain conditions.
Abstract
Policy optimization (PO) is a key ingredient for reinforcement learning (RL). For control design, certain constraints are usually enforced on the policies to optimize, accounting for either the stability, robustness, or safety concerns on the system. Hence, PO is by nature a constrained (nonconvex) optimization in most cases, whose global convergence is challenging to analyze in general. More importantly, some constraints that are safety-critical, e.g., the -norm constraint that guarantees the system robustness, are difficult to enforce as the PO methods proceed. Recently, policy gradient methods have been shown to converge to the global optimum of linear quadratic regulator (LQR), a classical optimal control problem, without regularizing/projecting the control iterates onto the stabilizing set, its (implicit) feasible set. This striking result is built upon the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
