Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center
Zhiyuan Yao, Zihan Ding, Thomas Clausen

TL;DR
This paper applies multi-agent reinforcement learning to network load balancing in data centers, demonstrating superior performance over traditional heuristics in realistic emulation environments.
Contribution
It formulates the load balancing problem as a Dec-POMDP and trains MARL methods directly on emulation systems, highlighting their advantages and challenges.
Findings
MARL outperforms traditional heuristics in realistic settings.
Independent load balancing strategies are not always optimal.
Analysis of difficulties in applying MARL to network load balancing.
Abstract
This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Traditional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all methods are directly trained and evaluated on an emulation system from moderate-to large-scale. Experiments on realistic testbeds show that the independent and "selfish" load balancing strategies are not necessarily the globally optimal ones, while the proposed MARL solution has a superior performance over different realistic settings.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
