TL;DR
This paper introduces RSD4, a deep reinforcement learning algorithm that effectively manages multi-user delay-constrained scheduling in dynamic, partially observable environments, outperforming existing methods.
Contribution
The paper presents a novel DRL algorithm, RSD4, incorporating RNNs and dual optimization to handle delay, resource constraints, and partial observability in scheduling tasks.
Findings
RSD4 outperforms existing methods in simulated environments.
The algorithm is robust to system dynamics and partial observability.
Scalability is achieved through user-level decomposition and node merging.
Abstract
Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing. Yet, it poses a critical challenge since the scheduler needs to make real-time decisions to guarantee the delay and resource constraints simultaneously without prior information of system dynamics, which can be time-varying and hard to estimate. Moreover, many practical scenarios suffer from partial observability issues, e.g., due to sensing noise or hidden correlation. To tackle these challenges, we propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient (), which is a data-driven method based on a Partially Observed Markov Decision Process (POMDP) formulation. guarantees resource and delay constraints by Lagrangian dual and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax
