A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement Learning with Queue Stability
Wenhan Xu, Jiashuo Jiang, Lei Deng, Danny Hin-Kwok Tsang

TL;DR
This paper introduces a novel Lyapunov Drift-Plus-Penalty algorithm tailored for reinforcement learning, effectively balancing queue stability and long-term optimization, with proven theoretical advantages and superior simulation performance.
Contribution
The paper develops a new LDPTRLQ algorithm that integrates Lyapunov Drift-Plus-Penalty with RL, providing theoretical guarantees and improved stability over existing methods.
Findings
LDPTRLQ outperforms baseline methods in simulations.
The algorithm effectively balances queue stability and long-term rewards.
Theoretical analysis confirms the superiority of the proposed method.
Abstract
With the proliferation of Internet of Things (IoT) devices, the demand for addressing complex optimization challenges has intensified. The Lyapunov Drift-Plus-Penalty algorithm is a widely adopted approach for ensuring queue stability, and some research has preliminarily explored its integration with reinforcement learning (RL). In this paper, we investigate the adaptation of the Lyapunov Drift-Plus-Penalty algorithm for RL applications, deriving an effective method for combining Lyapunov Drift-Plus-Penalty with RL under a set of common and reasonable conditions through rigorous theoretical analysis. Unlike existing approaches that directly merge the two frameworks, our proposed algorithm, termed Lyapunov drift-plus-penalty method tailored for reinforcement learning with queue stability (LDPTRLQ) algorithm, offers theoretical superiority by effectively balancing the greedy optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · Reinforcement Learning in Robotics · Advanced Wireless Network Optimization
MethodsSparse Evolutionary Training
