Escaping High-order Saddles in Policy Optimization for Linear Quadratic   Gaussian (LQG) Control

Yang Zheng; Yue Sun; Maryam Fazel; Na Li

arXiv:2204.00912·math.OC·April 5, 2022

Escaping High-order Saddles in Policy Optimization for Linear Quadratic Gaussian (LQG) Control

Yang Zheng, Yue Sun, Maryam Fazel, Na Li

PDF

Open Access

TL;DR

This paper proposes a perturbed policy gradient method tailored for LQG control, effectively escaping high-order saddle points and improving policy optimization in complex control scenarios.

Contribution

It introduces a novel reparameterization and perturbation technique specifically designed to escape high-order saddles in LQG policy optimization.

Findings

01

The method successfully escapes a broad class of high-order saddle points.

02

Reparameterization converts high-order saddles into strict saddles for easier escape.

03

The approach is theoretically characterized and validated.

Abstract

First order policy optimization has been widely used in reinforcement learning. It guarantees to find the optimal policy for the state-feedback linear quadratic regulator (LQR). However, the performance of policy optimization remains unclear for the linear quadratic Gaussian (LQG) control where the LQG cost has spurious suboptimal stationary points. In this paper, we introduce a novel perturbed policy gradient (PGD) method to escape a large class of bad stationary points (including high-order saddles). In particular, based on the specific structure of LQG, we introduce a novel reparameterization procedure which converts the iterate from a high-order saddle to a strict saddle, from which standard random perturbations in PGD can escape efficiently. We further characterize the high-order saddles that can be escaped by our algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control