Resource Allocation in Disaggregated Data Centre Systems with   Reinforcement Learning

Zacharaya Shabka; Georgios Zervas

arXiv:2106.02412·cs.LG·November 12, 2021

Resource Allocation in Disaggregated Data Centre Systems with Reinforcement Learning

Zacharaya Shabka, Georgios Zervas

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning approach using graph neural networks for resource allocation in disaggregated data centres, significantly improving efficiency and network resource usage over previous heuristics.

Contribution

It is the first to apply RL with graph neural networks to RDDC resource allocation, demonstrating superior performance and scalability.

Findings

01

RL policy outperforms heuristics in acceptance ratio, CPU, and memory utilization

02

Maintains performance on larger topologies with 100x nodes

03

Uses 5.3x less network resources than baselines

Abstract

Resource-disaggregated data centres (RDDC) propose a resource-centric, and high-utilisation architecture for data centres (DC), avoiding resource fragmentation and enabling arbitrarily sized resource pools to be allocated to tasks, rather than server-sized ones. RDDCs typically impose greater demand on the network, requiring more infrastructure and increasing cost and power, so new resource allocation algorithms that co-manage both server and networks resources are essential to ensure that allocation is not bottlenecked by the network, and that requests can be served successfully with minimal networking resources. We apply reinforcement learning (RL) to this problem for the first time and show that an RL policy based on graph neural networks can learn resource allocation policies end-to-end that outperform previous hand-engineered heuristics by up to 22.0\%, 42.6\% and 22.6\% for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Software-Defined Networks and 5G