Pull-based load distribution in large-scale heterogeneous service systems
Alexander Stolyar

TL;DR
This paper analyzes a pull-based load distribution algorithm in large-scale heterogeneous cloud systems, proving its asymptotic optimality in minimizing customer blocking and waiting as system size grows.
Contribution
It introduces and proves the asymptotic optimality of the PULL load distribution algorithm in large, heterogeneous server pools under sub-critical load.
Findings
Asymptotic probability of blocking or waiting vanishes as system size increases.
PULL algorithm remains optimal under various generalizations.
Effective load balancing in large-scale heterogeneous systems.
Abstract
The model is motivated by the problem of load distribution in large-scale cloud-based data processing systems. We consider a heterogeneous service system, consisting of multiple large server pools. The pools are different in that their servers may have different processing speed and/or different buffer sizes (which may be finite or infinite). We study an asymptotic regime in which the customer arrival rate and pool sizes scale to infinity simultaneously, in proportion to some scaling parameter . Arriving customers are assigned to the servers by a "router", according to a {\em pull-based} algorithm, called PULL. Under the algorithm, each server sends a "pull-message" to the router, when it becomes idle; the router assigns an arriving customer to a server according to a randomly chosen available pull-message, if there are any, or to a random server, otherwise. Assuming sub-critical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
