Scalable Load Balancing in the Presence of Heterogeneous Servers
Kristen Gardner, Jazeem Abdul Jaleel, Alexander Wickeham, Sherwin, Doroudi

TL;DR
This paper introduces heterogeneity-aware load balancing policies for large-scale systems with diverse server speeds, improving response times and stability by leveraging server speed information in dispatching decisions.
Contribution
It adapts power-of-d policies to heterogeneous environments, providing analytically tractable, optimal, and stable dispatching strategies that outperform existing methods.
Findings
Policies achieve exact mean response time analysis as server count grows large
Proposed policies outperform Shortest-Expected-Delay in response time
Policies ensure maximal stability in heterogeneous systems
Abstract
Heterogeneity is becoming increasingly ubiquitous in modern large-scale computer systems. Developing good load balancing policies for systems whose resources have varying speeds is crucial in achieving low response times. Indeed, how best to dispatch jobs to servers is a classical and well-studied problem in the queueing literature. Yet the bulk of existing work on large-scale systems assumes homogeneous servers; unfortunately, policies that perform well in the homogeneous setting can cause unacceptably poor performance---or even instability---in heterogeneous systems. We adapt the "power-of-d" versions of both the Join-the-Idle-Queue and Join-the-Shortest-Queue policies to design two corresponding families of heterogeneity-aware dispatching policies, each of which is parameterized by a pair of routing probabilities. Unlike their heterogeneity-unaware counterparts, our policies use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
