Challenges in Model Agnostic Controller Learning for Unstable Systems
Mario Sznaier, Mustafa Bozdag

TL;DR
This paper examines the limitations of model-agnostic controller learning via direct policy optimization, highlighting stability issues and proposing alternative approaches to ensure reliable control in unstable systems.
Contribution
It provides a theoretical analysis showing the instability risks of direct policy optimization and explores new methods to mitigate these issues.
Findings
Direct policy optimization can cause unstable pole-zero cancellations.
Unbounded outputs may occur due to internal instability.
Alternative strategies can prevent stability loss.
Abstract
Model agnostic controller learning, for instance by direct policy optimization, has been the object of renewed attention lately, since it avoids a computationally expensive system identification step. Indeed, direct policy search has been empirically shown to lead to optimal controllers in a number of cases of practical importance. However, to date, these empirical results have not been backed up with a comprehensive theoretical analysis for general problems. In this paper we use a simple example to show that direct policy optimization is not directly generalizable to other seemingly simple problems. In such cases, direct optimization of a performance index can lead to unstable pole/zero cancellations, resulting in the loss of internal stability and unbounded outputs in response to arbitrarily small perturbations. We conclude the paper by analyzing several alternatives to avoid this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Model Reduction and Neural Networks · Advanced Control Systems Optimization
