$t$-Testing the Waters: Empirically Validating Assumptions for Reliable A/B-Testing
Olivier Jeunen

TL;DR
This paper presents a practical method to empirically validate the normality assumption underlying $t$-tests in A/B testing, ensuring the reliability of statistical conclusions especially with skewed or non-normal data.
Contribution
It introduces a resampling-based approach using Kolmogorov-Smirnov tests on A/A-tests to assess the validity of the $t$-test assumptions in web experiments.
Findings
The method effectively detects scenarios with inflated Type-I errors.
It provides a practical tool for validating $t$-test assumptions in real-world A/B tests.
The approach improves the robustness of experimental conclusions.
Abstract
A/B-tests are a cornerstone of experimental design on the web, with wide-ranging applications and use-cases. The statistical -test comparing differences in means is the most commonly used method for assessing treatment effects, often justified through the Central Limit Theorem (CLT). The CLT ascertains that, as the sample size grows, the sampling distribution of the Average Treatment Effect converges to normality, making the -test valid for sufficiently large sample sizes. When outcome measures are skewed or non-normal, quantifying what "sufficiently large" entails is not straightforward. To ensure that confidence intervals maintain proper coverage and that -values accurately reflect the false positive rate, it is critical to validate this normality assumption. We propose a practical method to test this, by analysing repeatedly resampled A/A-tests. When the normality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Machine Learning and Algorithms · Water Quality and Resources Studies
