Policy Optimization for Markovian Jump Linear Quadratic Control:   Gradient-Based Methods and Global Convergence

Joao Paulo Jansch-Porto; Bin Hu; Geir Dullerud

arXiv:2011.11852·math.OC·November 25, 2020·6 cites

Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient-Based Methods and Global Convergence

Joao Paulo Jansch-Porto, Bin Hu, Geir Dullerud

PDF

Open Access

TL;DR

This paper analyzes the global convergence of gradient-based policy optimization methods for Markovian jump linear quadratic control, demonstrating their effectiveness and providing theoretical guarantees despite the non-convex landscape.

Contribution

It establishes the global convergence of gradient descent, Gauss-Newton, and natural policy gradient methods for MJLS control, a novel theoretical insight.

Findings

01

All three methods converge linearly to the optimal controller

02

The optimization landscape exhibits properties like coercivity and gradient dominance

03

Numerical examples support the theoretical results

Abstract

Recently, policy optimization for control purposes has received renewed attention due to the increasing interest in reinforcement learning. In this paper, we investigate the global convergence of gradient-based policy optimization methods for quadratic optimal control of discrete-time Markovian jump linear systems (MJLS). First, we study the optimization landscape of direct policy optimization for MJLS, with static state feedback controllers and quadratic performance costs. Despite the non-convexity of the resultant problem, we are still able to identify several useful properties such as coercivity, gradient dominance, and almost smoothness. Based on these properties, we show global convergence of three types of policy optimization methods: the gradient descent method; the Gauss-Newton method; and the natural policy gradient method. We prove that all three methods converge to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Age of Information Optimization