Policy Gradient Method for LQG Control via Input-Output-History Representation: Convergence to $O(\epsilon)$-Stationary Points

Tomonori Sadamoto; Takashi Tanaka

arXiv:2510.19141·math.OC·October 23, 2025

Policy Gradient Method for LQG Control via Input-Output-History Representation: Convergence to $O(\epsilon)$-Stationary Points

Tomonori Sadamoto, Takashi Tanaka

PDF

Open Access

TL;DR

None

Contribution

None

Abstract

We study the policy gradient method (PGM) for the linear quadratic Gaussian (LQG) dynamic output-feedback control problem using an input-output-history (IOH) representation of the closed-loop system. First, we show that any dynamic output-feedback controller is equivalent to a static partial-state feedback gain for a new system representation characterized by a finite-length IOH. Leveraging this equivalence, we reformulate the search for an optimal dynamic output feedback controller as an optimization problem over the corresponding partial-state feedback gain. Next, we introduce a relaxed version of the IOH-based LQG problem by incorporating a small process noise with covariance $ϵ I$ into the new system to ensure coerciveness, a key condition for establishing gradient-based convergence guarantees. Consequently, we show that a vanilla PGM for the relaxed problem converges to an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research