Learning Distributed and Fair Policies for Network Load Balancing as Markov Potential Game
Zhiyuan Yao, Zihan Ding

TL;DR
This paper introduces a distributed multi-agent reinforcement learning approach for network load balancing in data centers, addressing heterogeneity, dynamics, and partial observability, and demonstrating near-optimal and superior real-world performance.
Contribution
It formulates load balancing as a Markov potential game and proposes a fully distributed MARL algorithm, reducing communication overhead and improving performance.
Findings
Close-to-optimal performance in simulations
Superior results over existing load balancers in real systems
Effective handling of heterogeneous and dynamic environments
Abstract
This paper investigates the network load balancing problem in data centers (DCs) where multiple load balancers (LBs) are deployed, using the multi-agent reinforcement learning (MARL) framework. The challenges of this problem consist of the heterogeneous processing architecture and dynamic environments, as well as limited and partial observability of each LB agent in distributed networking systems, which can largely degrade the performance of in-production load balancing algorithms in real-world setups. Centralised-training-decentralised-execution (CTDE) RL scheme has been proposed to improve MARL performance, yet it incurs -- especially in distributed networking systems, which prefer distributed and plug-and-play design scheme -- additional communication and management overhead among agents. We formulate the multi-agent load balancing problem as a Markov potential game, with a carefully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCloud Computing and Resource Management · Distributed Control Multi-Agent Systems · Age of Information Optimization
