Decentralized Task Scheduling in Distributed Systems: A Deep Reinforcement Learning Approach

Daniel Benniah John

arXiv:2603.24738·cs.DC·March 27, 2026

Decentralized Task Scheduling in Distributed Systems: A Deep Reinforcement Learning Approach

Daniel Benniah John

PDF

Open Access

TL;DR

This paper introduces a decentralized deep reinforcement learning framework for task scheduling in heterogeneous distributed systems, achieving significant improvements in efficiency, energy use, and SLA satisfaction with a lightweight implementation suitable for edge devices.

Contribution

It presents a novel multi-agent DRL approach formulated as a Dec-POMDP, with a lightweight NumPy-based architecture for scalable, decentralized task scheduling.

Findings

01

15.6% reduction in task completion time

02

15.2% energy efficiency improvement

03

82.3% SLA satisfaction rate

Abstract

Efficient task scheduling in large-scale distributed systems presents significant challenges due to dynamic workloads, heterogeneous resources, and competing quality-of-service requirements. Traditional centralized approaches face scalability limitations and single points of failure, while classical heuristics lack adaptability to changing conditions. This paper proposes a decentralized multi-agent deep reinforcement learning (DRL-MADRL) framework for task scheduling in heterogeneous distributed systems. We formulate the problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and develop a lightweight actor-critic architecture implemented using only NumPy, enabling deployment on resource-constrained edge devices without heavyweight machine learning frameworks. Using workload characteristics derived from the publicly available Google Cluster Trace dataset, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · IoT and Edge/Fog Computing