Policy Gradient Methods for Designing Dynamic Output Feedback Controllers
Tomonori Sadamoto, Takumi Hirai

TL;DR
This paper introduces policy gradient methods for designing dynamic output feedback controllers in discrete-time partially observable systems, providing both model-based and model-free approaches with convergence guarantees and practical implementations.
Contribution
It establishes an equivalence between output feedback and state-feedback design for IOH systems and develops convergent policy gradient algorithms, including model-free variants, for controller synthesis.
Findings
Model-based PGM converges globally linearly.
Model-free PGMs are effective with sample complexity analysis.
Numerical simulations validate the proposed methods.
Abstract
This paper proposes model-based and model-free policy gradient methods (PGMs) for designing dynamic output feedback controllers for discrete-time partially observable systems. To fulfill this objective, we first show that any dynamic output feedback controller design is equivalent to a state-feedback controller design for a newly introduced system whose internal state is a finite-length input-output history (IOH). Next, based on this equivalency, we propose a model-based PGM and show its global linear convergence by proving that the Polyak-Lojasiewicz inequality holds for a reachability-based lossless projection of the IOH dynamics. Moreover, we propose two model-free implementations of the PGM: the multi- and single-episodic PGM. The former is a Monte Carlo approximation of the model-based PGM, whereas the latter is a simplified version of the former for ease of use in real systems. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Real-time simulation and control systems · Cardiovascular Function and Risk Factors
MethodsProbability Guided Maxout
