Beyond Normality: Reliable A/B Testing with Non-Gaussian Data
Junpeng Gong, Chunkai Wang, Hao Li, Jinyong Ma, Haoxuan Li, Xu He

TL;DR
This paper addresses the unreliability of traditional t-tests in A/B testing with non-Gaussian data, proposing explicit sample size formulas and an Edgeworth correction to improve test validity in practical, real-world scenarios.
Contribution
It introduces explicit formulas for minimum sample sizes and an Edgeworth-based correction to enhance A/B test reliability under non-normal data distributions.
Findings
Traditional t-tests often require hundreds of millions of samples for reliability.
Edgeworth correction improves p-value accuracy with limited samples.
Theoretical thresholds are validated on real-world A/B testing platforms.
Abstract
A/B testing has become the cornerstone of decision-making in online markets, guiding how platforms launch new features, optimize pricing strategies, and improve user experience. In practice, we typically employ the pairwise -test to compare outcomes between the treatment and control groups, thereby assessing the effectiveness of a given strategy. To be trustworthy, these experiments must keep Type I error (i.e., false positive rate) under control; otherwise, we may launch harmful strategies. However, in real-world applications, we find that A/B testing often fails to deliver reliable results. When the data distribution departs from normality or when the treatment and control groups differ in sample size, the commonly used pairwise -test is no longer trustworthy. In this paper, we quantify how skewed, long tailed data and unequal allocation distort error rates and derive explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
