Escaping High-order Saddles in Policy Optimization for Linear Quadratic Gaussian (LQG) Control
Yang Zheng, Yue Sun, Maryam Fazel, Na Li

TL;DR
This paper proposes a perturbed policy gradient method tailored for LQG control, effectively escaping high-order saddle points and improving policy optimization in complex control scenarios.
Contribution
It introduces a novel reparameterization and perturbation technique specifically designed to escape high-order saddles in LQG policy optimization.
Findings
The method successfully escapes a broad class of high-order saddle points.
Reparameterization converts high-order saddles into strict saddles for easier escape.
The approach is theoretically characterized and validated.
Abstract
First order policy optimization has been widely used in reinforcement learning. It guarantees to find the optimal policy for the state-feedback linear quadratic regulator (LQR). However, the performance of policy optimization remains unclear for the linear quadratic Gaussian (LQG) control where the LQG cost has spurious suboptimal stationary points. In this paper, we introduce a novel perturbed policy gradient (PGD) method to escape a large class of bad stationary points (including high-order saddles). In particular, based on the specific structure of LQG, we introduce a novel reparameterization procedure which converts the iterate from a high-order saddle to a strict saddle, from which standard random perturbations in PGD can escape efficiently. We further characterize the high-order saddles that can be escaped by our algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control
