Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient-Based Methods and Global Convergence
Joao Paulo Jansch-Porto, Bin Hu, Geir Dullerud

TL;DR
This paper analyzes the global convergence of gradient-based policy optimization methods for Markovian jump linear quadratic control, demonstrating their effectiveness and providing theoretical guarantees despite the non-convex landscape.
Contribution
It establishes the global convergence of gradient descent, Gauss-Newton, and natural policy gradient methods for MJLS control, a novel theoretical insight.
Findings
All three methods converge linearly to the optimal controller
The optimization landscape exhibits properties like coercivity and gradient dominance
Numerical examples support the theoretical results
Abstract
Recently, policy optimization for control purposes has received renewed attention due to the increasing interest in reinforcement learning. In this paper, we investigate the global convergence of gradient-based policy optimization methods for quadratic optimal control of discrete-time Markovian jump linear systems (MJLS). First, we study the optimization landscape of direct policy optimization for MJLS, with static state feedback controllers and quadratic performance costs. Despite the non-convexity of the resultant problem, we are still able to identify several useful properties such as coercivity, gradient dominance, and almost smoothness. Based on these properties, we show global convergence of three types of policy optimization methods: the gradient descent method; the Gauss-Newton method; and the natural policy gradient method. We prove that all three methods converge to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Age of Information Optimization
