Policy Optimization for $\mathcal{H}_2$ Linear Control with   $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global   Convergence

Kaiqing Zhang; Bin Hu; Tamer Ba\c{s}ar

arXiv:1910.09496·math.OC·February 16, 2021

Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence

Kaiqing Zhang, Bin Hu, Tamer Ba\c{s}ar

PDF

TL;DR

This paper analyzes policy optimization methods for $\\mathcal{H}_2$ linear control with $\\mathcal{H}_\infty$ robustness, demonstrating their implicit regularization and global convergence despite nonconvexity and lack of coercivity.

Contribution

It establishes the convergence of policy optimization algorithms for $\\mathcal{H}_2$ control with $\\mathcal{H}_\infty$ robustness, highlighting implicit regularization and overcoming nonconvex challenges.

Findings

01

Algorithms preserve $\\mathcal{H}_\infty$ constraints via implicit regularization.

02

Global convergence to optimal policies with sublinear rates.

03

Potential for super-linear convergence under certain conditions.

Abstract

Policy optimization (PO) is a key ingredient for reinforcement learning (RL). For control design, certain constraints are usually enforced on the policies to optimize, accounting for either the stability, robustness, or safety concerns on the system. Hence, PO is by nature a constrained (nonconvex) optimization in most cases, whose global convergence is challenging to analyze in general. More importantly, some constraints that are safety-critical, e.g., the $H_{\infty}$ -norm constraint that guarantees the system robustness, are difficult to enforce as the PO methods proceed. Recently, policy gradient methods have been shown to converge to the global optimum of linear quadratic regulator (LQR), a classical optimal control problem, without regularizing/projecting the control iterates onto the stabilizing set, its (implicit) feasible set. This striking result is built upon the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.