Load is not what you should balance: Introducing Prequal
Bartek Wydrowski, Robert Kleinberg, Stephen M. Rumble, Aaron Archer

TL;DR
Prequal is a novel load balancer that reduces request latency in distributed systems by actively probing server latency and active requests, rather than balancing CPU load, leading to improved performance and resource utilization.
Contribution
Prequal introduces a new load balancing approach based on latency and active requests, extending the power-of-d-choices paradigm with asynchronous probing for better latency management.
Findings
Significantly reduces tail latency and error rates.
Decreases resource usage and improves system utilization.
Successfully deployed at YouTube for over two years.
Abstract
We present Prequal (Probing to Reduce Queuing and Latency), a load balancer for distributed multi-tenant systems. Prequal aims to minimize real-time request latency in the presence of heterogeneous server capacities and non-uniform, time-varying antagonist load. It actively probes server load to leverage the power-of-d-choices paradigm, extending it with asynchronous and reusable probes. Cutting against received wisdom, Prequal does not balance CPU load, but instead selects servers according to estimated latency and active requests-in-flight (RIF). We explore its major design features on a testbed system and evaluate it on YouTube, where it has been deployed for more than two years. Prequal has dramatically decreased tail latency, error rates, and resource use, enabling YouTube and other production systems at Google to run at much higher utilization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Distributed systems and fault tolerance
