A Policy Optimization Method Towards Optimal-time Stability
Shengjie Wang, Fengbo Lan, Xiang Zheng, Yuxue Cao, Oluwatosin Oseni,, Haotian Xu, Tao Zhang, Yang Gao

TL;DR
This paper introduces ALAC, a novel reinforcement learning algorithm that ensures systems reach stable equilibrium within optimal time using Lyapunov stability, improving performance on robotic tasks.
Contribution
It proposes a new policy optimization method incorporating sampling-based Lyapunov stability to achieve optimal-time stability in RL.
Findings
ALAC outperforms previous methods on robotic tasks.
It effectively guides systems to stable equilibrium within optimal time.
The approach enhances stability and performance in model-free RL.
Abstract
In current model-free reinforcement learning (RL) algorithms, stability criteria based on sampling methods are commonly utilized to guide policy optimization. However, these criteria only guarantee the infinite-time convergence of the system's state to an equilibrium point, which leads to sub-optimality of the policy. In this paper, we propose a policy optimization technique incorporating sampling-based Lyapunov stability. Our approach enables the system's state to reach an equilibrium point within an optimal time and maintain stability thereafter, referred to as "optimal-time stability". To achieve this, we integrate the optimization method into the Actor-Critic framework, resulting in the development of the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic tasks, our approach outperforms previous studies significantly, effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
