Evaluating A/B Testing Methodologies via Sample Splitting: Theory and Practice
Ryan Kessler, James McQueen, Miikka Rokkanen

TL;DR
This paper presents a theoretical framework for sample splitting in A/B testing, analyzing bias, variance, and confidence intervals to improve evaluation of new testing methodologies.
Contribution
It introduces a comprehensive theoretical analysis of sample splitting in A/B testing, including bias correction, asymptotic distributions, and practical implementation guidance.
Findings
Sample-split estimators are biased for full-sample performance but consistent for sample-split analogues.
Derived asymptotic distributions enable valid confidence intervals.
Simulation results validate theoretical insights and inform practical implementation.
Abstract
We develop a theoretical framework for sample splitting in A/B testing environments, where data for each test are partitioned into two splits to measure methodological performance when the true impacts of tests are unobserved. We show that sample-split estimators are generally biased for full-sample performance but consistently estimate sample-split analogues of it. We derive their asymptotic distributions, construct valid confidence intervals, and characterize the bias-variance trade-offs underlying sample-split design choices. We validate our theoretical results through simulations and provide implementation guidance for A/B testing products seeking to evaluate new estimators and decision rules.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Advanced Causal Inference Techniques · Distributed Sensor Networks and Detection Algorithms
