A note on stabilizing reinforcement learning
Pavel Osinenko, Grigory Yaremenko, Ilya Osokin

TL;DR
This paper critically examines a popular stabilizing reinforcement learning approach using actor-critic methods, identifies fundamental issues in its stability analysis, and discusses implications for future research in ensuring RL controller stability.
Contribution
It highlights a critical flaw in the stability analysis of a common adaptive control-based RL method and provides a convergence analysis under stabilized conditions.
Findings
Identified a problem in the stability analysis of the actor-critic approach.
Provided a counterexample demonstrating the flaw.
Derived a convergence analysis assuming environment stabilization.
Abstract
Reinforcement learning is a general methodology of adaptive optimal control that has attracted much attention in various fields ranging from video game industry to robot manipulators. Despite its remarkable performance demonstrations, plain reinforcement learning controllers do not guarantee stability which compromises their applicability in industry. To provide such guarantees, measures have to be taken. This gives rise to what could generally be called stabilizing reinforcement learning. Concrete approaches range from employment of human overseers to filter out unsafe actions to formally verified shields and fusion with classical stabilizing controllers. A line of attack that utilizes elements of adaptive control has become fairly popular in the recent years. In this note, we critically address such an approach in a fairly general actor-critic setup for nonlinear time-continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control
