A Kubernetes custom scheduler based on reinforcement learning for compute-intensive pods
Hanlin Zhou, Huah Yong Chan, Shun Yao Zhang, Meie Lin, Jingfei Ni

TL;DR
This paper introduces reinforcement learning-based custom schedulers for Kubernetes that significantly improve resource utilization and energy efficiency for compute-intensive pods, outperforming default and other AI-based schedulers.
Contribution
It presents two novel reinforcement learning schedulers, SDQN and SDQN-n, tailored for compute-intensive workloads, demonstrating superior performance over existing methods.
Findings
SDQN reduces CPU utilization by 10% on average.
SDQN-n achieves over 20% reduction in CPU utilization.
The proposed schedulers enhance energy efficiency and resource consolidation.
Abstract
With the rise of cloud computing and lightweight containers, Docker has emerged as a leading technology for rapid service deployment, with Kubernetes responsible for pod orchestration. However, for compute-intensive workloads-particularly web services executing containerized machine-learning training-the default Kubernetes scheduler does not always achieve optimal placement. To address this, we propose two custom, reinforcement-learning-based schedulers, SDQN and SDQN-n, both built on the Deep Q-Network (DQN) framework. In compute-intensive scenarios, these models outperform the default Kubernetes scheduler as well as Transformer-and LSTM-based alternatives, reducing average CPU utilization per cluster node by 10%, and by over 20% when using SDQN-n. Moreover, our results show that SDQN-n approach of consolidating pods onto fewer nodes further amplifies resource savings and helps advance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Software-Defined Networks and 5G · IoT and Edge/Fog Computing
