AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling

Hamed Hamzeh

arXiv:2603.12031·cs.DC·May 8, 2026

AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling

Hamed Hamzeh

PDF

TL;DR

This paper introduces AGMARL-DKS, a scalable, stress-aware multi-agent reinforcement learning scheduler for Kubernetes that improves resource utilization, fault tolerance, and cost efficiency in dynamic cloud environments.

Contribution

It presents a novel multi-agent RL approach with graph neural networks and lexicographical ordering for adaptive, stress-aware scheduling in Kubernetes clusters.

Findings

01

Outperforms default scheduler in GKE in fault tolerance, utilization, and cost

02

Uses multi-agent system with GNN for global context awareness

03

Employs stress-aware lexicographical ordering for multi-objective trade-offs

Abstract

State-of-the-art cloud-native applications require intelligent schedulers that can effectively balance system stability, resource utilisation, and associated costs. While Kubernetes provides feasibility-based placement by default, recent research efforts have explored the use of reinforcement learning (RL) for more intelligent scheduling decisions. However, current RL-based schedulers have three major limitations. First, most of these schedulers use monolithic centralised agents, which are non-scalable for large heterogeneous clusters. Second, the ones that use multi-objective reward functions assume simple, static, linear combinations of the objectives. Third, no previous work has produced a stress-aware scheduler that can react adaptively to dynamic conditions. To address these gaps in current research, we propose the Adaptive Graph-enhanced Multi-Agent Reinforcement Learning Dynamic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.