MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control

Yongwei Zhang; Yuanzhe Xing; Quanyi Liang; Quan Quan; and Zhikun She

arXiv:2512.24955·cs.LG·March 24, 2026

MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control

Yongwei Zhang, Yuanzhe Xing, Quanyi Liang, Quan Quan, and Zhikun She

PDF

Open Access

TL;DR

MSACL introduces a multi-step actor-critic reinforcement learning method that incorporates Lyapunov certificates to ensure exponential stability, improving efficiency, robustness, and generalization in complex control tasks.

Contribution

It presents a novel multi-step RL framework with Lyapunov certificates, enabling stable and efficient learning without elaborate reward engineering.

Findings

01

Consistent performance improvements over baselines.

02

Robustness against environmental uncertainties.

03

Effective generalization to unseen signals.

Abstract

For stabilizing control tasks, model-free reinforcement learning (RL) approaches face numerous challenges, particularly regarding the issues of effectiveness and efficiency in complex high-dimensional environments with limited training data. To address these challenges, we propose Multi-Step Actor-Critic Learning with Lyapunov Certificates (MSACL), a novel approach that integrates exponential stability into off-policy maximum entropy reinforcement learning (MERL). In contrast to existing RL-based approaches that depend on elaborate reward engineering and single-step constraints, MSACL adopts intuitive reward design and exploits multi-step samples to enable exploratory actor-critic learning. Specifically, we first introduce Exponential Stability Labels (ESLs) to categorize training samples and propose a $λ$ -weighted aggregation mechanism to learn Lyapunov certificates. Based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Advanced Control Systems Optimization