Balanced allocation: considerations from large scale service environments
Amer Diwan, Prabhakar Raghavan, Eli Upfal

TL;DR
This paper investigates d-way balanced allocation in large-scale systems, focusing on its robustness during bursts, with priorities, and under noisy information, supported by simulations and analytical models.
Contribution
It extends the understanding of d-way balanced allocation by analyzing its performance during bursts, with job priorities, and with noisy data in large-scale environments.
Findings
Quick recovery from request bursts
Graceful handling of job priorities
Robustness to noisy information
Abstract
We study d-way balanced allocation, which assigns each incoming job to the lightest loaded among d randomly chosen servers. While prior work has extensively studied the performance of the basic scheme, there has been less published work on adapting this technique to many aspects of large-scale systems. Based on our experience in building and running planet-scale cloud applications, we extend the understanding of d-way balanced allocation along the following dimensions: (i) Bursts: Events such as breaking news can produce bursts of requests that may temporarily exceed the servicing capacity of the system. Thus, we explore what happens during a burst and how long it takes for the system to recover from such bursts. (ii) Priorities: Production systems need to handle jobs with a mix of priorities (e.g., user facing requests may be high priority while other requests may be low priority).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Software System Performance and Reliability · Advanced Queuing Theory Analysis
