A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement Learning with Queue Stability

Wenhan Xu; Jiashuo Jiang; Lei Deng; Danny Hin-Kwok Tsang

arXiv:2506.04291·cs.LG·June 6, 2025

A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement Learning with Queue Stability

Wenhan Xu, Jiashuo Jiang, Lei Deng, Danny Hin-Kwok Tsang

PDF

Open Access

TL;DR

This paper introduces a novel Lyapunov Drift-Plus-Penalty algorithm tailored for reinforcement learning, effectively balancing queue stability and long-term optimization, with proven theoretical advantages and superior simulation performance.

Contribution

The paper develops a new LDPTRLQ algorithm that integrates Lyapunov Drift-Plus-Penalty with RL, providing theoretical guarantees and improved stability over existing methods.

Findings

01

LDPTRLQ outperforms baseline methods in simulations.

02

The algorithm effectively balances queue stability and long-term rewards.

03

Theoretical analysis confirms the superiority of the proposed method.

Abstract

With the proliferation of Internet of Things (IoT) devices, the demand for addressing complex optimization challenges has intensified. The Lyapunov Drift-Plus-Penalty algorithm is a widely adopted approach for ensuring queue stability, and some research has preliminarily explored its integration with reinforcement learning (RL). In this paper, we investigate the adaptation of the Lyapunov Drift-Plus-Penalty algorithm for RL applications, deriving an effective method for combining Lyapunov Drift-Plus-Penalty with RL under a set of common and reasonable conditions through rigorous theoretical analysis. Unlike existing approaches that directly merge the two frameworks, our proposed algorithm, termed Lyapunov drift-plus-penalty method tailored for reinforcement learning with queue stability (LDPTRLQ) algorithm, offers theoretical superiority by effectively balancing the greedy optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · Reinforcement Learning in Robotics · Advanced Wireless Network Optimization

MethodsSparse Evolutionary Training