Policy Gradient Methods for Designing Dynamic Output Feedback   Controllers

Tomonori Sadamoto; Takumi Hirai

arXiv:2210.09735·eess.SY·July 26, 2024

Policy Gradient Methods for Designing Dynamic Output Feedback Controllers

Tomonori Sadamoto, Takumi Hirai

PDF

Open Access

TL;DR

This paper introduces policy gradient methods for designing dynamic output feedback controllers in discrete-time partially observable systems, providing both model-based and model-free approaches with convergence guarantees and practical implementations.

Contribution

It establishes an equivalence between output feedback and state-feedback design for IOH systems and develops convergent policy gradient algorithms, including model-free variants, for controller synthesis.

Findings

01

Model-based PGM converges globally linearly.

02

Model-free PGMs are effective with sample complexity analysis.

03

Numerical simulations validate the proposed methods.

Abstract

This paper proposes model-based and model-free policy gradient methods (PGMs) for designing dynamic output feedback controllers for discrete-time partially observable systems. To fulfill this objective, we first show that any dynamic output feedback controller design is equivalent to a state-feedback controller design for a newly introduced system whose internal state is a finite-length input-output history (IOH). Next, based on this equivalency, we propose a model-based PGM and show its global linear convergence by proving that the Polyak-Lojasiewicz inequality holds for a reachability-based lossless projection of the IOH dynamics. Moreover, we propose two model-free implementations of the PGM: the multi- and single-episodic PGM. The former is a Monte Carlo approximation of the model-based PGM, whereas the latter is a simplified version of the former for ease of use in real systems. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Real-time simulation and control systems · Cardiovascular Function and Risk Factors

MethodsProbability Guided Maxout