Unbiased Experiments in Congested Networks
Bruce Spang, Veronica Hannan, Shravya Kunamalla, Te-Yuan Huang, Nick, McKeown, Ramesh Johari

TL;DR
This paper reveals that congestion in networks biases A/B testing results for new algorithms, leading to inaccurate performance evaluations, and proposes alternative designs to mitigate this bias.
Contribution
It identifies the bias caused by network congestion in A/B tests and introduces alternative experimental designs to improve accuracy in network algorithm evaluation.
Findings
A/B tests can significantly misestimate algorithm performance due to congestion bias.
In lab experiments, congestion bias can cause over 150% differences in throughput estimates.
Real-world tests with Netflix show that congestion bias affects metric estimates and effect sizes.
Abstract
When developing a new networking algorithm, it is established practice to run a randomized experiment, or A/B test, to evaluate its performance. In an A/B test, traffic is randomly allocated between a treatment group, which uses the new algorithm, and a control group, which uses the existing algorithm. However, because networks are congested, both treatment and control traffic compete against each other for resources in a way that biases the outcome of these tests. This bias can have a surprisingly large effect; for example, in lab A/B tests with two widely used congestion control algorithms, the treatment appeared to deliver 150% higher throughput when used by a few flows, and 75% lower throughput when used by most flows-despite the fact that the two algorithms have identical throughput when used by all traffic. Beyond the lab, we show that A/B tests can also be biased at scale. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
