Confidence intervals for AB-test
Cyrille Dubarry

TL;DR
This paper introduces a rigorous mathematical framework for AB-testing and proposes three algorithms using bootstrapping and the central limit theorem to compute reliable confidence intervals for various metrics, improving test duration decisions.
Contribution
It presents a new mathematical model for AB-tests and three novel algorithms for confidence interval estimation applicable to diverse metrics beyond standard success probabilities.
Findings
Algorithms extend to multiple metrics including ratios and counts.
Confidence intervals are reliable for both absolute and relative metrics.
Framework improves decision-making in AB-test durations.
Abstract
AB-testing is a very popular technique in web companies since it makes it possible to accurately predict the impact of a modification with the simplicity of a random split across users. One of the critical aspects of an AB-test is its duration and it is important to reliably compute confidence intervals associated with the metric of interest to know when to stop the test. In this paper, we define a clean mathematical framework to model the AB-test process. We then propose three algorithms based on bootstrapping and on the central limit theorem to compute reliable confidence intervals which extend to other metrics than the common probabilities of success. They apply to both absolute and relative increments of the most used comparison metrics, including the number of occurrences of a particular event and a click-through rate implying a ratio.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Anomaly Detection Techniques and Applications
