Learning Distributed and Fair Policies for Network Load Balancing as   Markov Potential Game

Zhiyuan Yao; Zihan Ding

arXiv:2206.01451·cs.AI·October 17, 2022

Learning Distributed and Fair Policies for Network Load Balancing as Markov Potential Game

Zhiyuan Yao, Zihan Ding

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a distributed multi-agent reinforcement learning approach for network load balancing in data centers, addressing heterogeneity, dynamics, and partial observability, and demonstrating near-optimal and superior real-world performance.

Contribution

It formulates load balancing as a Markov potential game and proposes a fully distributed MARL algorithm, reducing communication overhead and improving performance.

Findings

01

Close-to-optimal performance in simulations

02

Superior results over existing load balancers in real systems

03

Effective handling of heterogeneous and dynamic environments

Abstract

This paper investigates the network load balancing problem in data centers (DCs) where multiple load balancers (LBs) are deployed, using the multi-agent reinforcement learning (MARL) framework. The challenges of this problem consist of the heterogeneous processing architecture and dynamic environments, as well as limited and partial observability of each LB agent in distributed networking systems, which can largely degrade the performance of in-production load balancing algorithms in real-world setups. Centralised-training-decentralised-execution (CTDE) RL scheme has been proposed to improve MARL performance, yet it incurs -- especially in distributed networking systems, which prefer distributed and plug-and-play design scheme -- additional communication and management overhead among agents. We formulate the multi-agent load balancing problem as a Markov potential game, with a carefully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhiyuanyaoj/marllb
pytorchOfficial

Videos

Learning Distributed and Fair Policies for Network Load Balancing as Markov Potential Game· slideslive

Taxonomy

TopicsCloud Computing and Resource Management · Distributed Control Multi-Agent Systems · Age of Information Optimization