Evaluating A/B Testing Methodologies via Sample Splitting: Theory and Practice

Ryan Kessler; James McQueen; Miikka Rokkanen

arXiv:2512.03366·econ.EM·March 24, 2026

Evaluating A/B Testing Methodologies via Sample Splitting: Theory and Practice

Ryan Kessler, James McQueen, Miikka Rokkanen

PDF

Open Access

TL;DR

This paper presents a theoretical framework for sample splitting in A/B testing, analyzing bias, variance, and confidence intervals to improve evaluation of new testing methodologies.

Contribution

It introduces a comprehensive theoretical analysis of sample splitting in A/B testing, including bias correction, asymptotic distributions, and practical implementation guidance.

Findings

01

Sample-split estimators are biased for full-sample performance but consistent for sample-split analogues.

02

Derived asymptotic distributions enable valid confidence intervals.

03

Simulation results validate theoretical insights and inform practical implementation.

Abstract

We develop a theoretical framework for sample splitting in A/B testing environments, where data for each test are partitioned into two splits to measure methodological performance when the true impacts of tests are unobserved. We show that sample-split estimators are generally biased for full-sample performance but consistently estimate sample-split analogues of it. We derive their asymptotic distributions, construct valid confidence intervals, and characterize the bias-variance trade-offs underlying sample-split design choices. We validate our theoretical results through simulations and provide implementation guidance for A/B testing products seeking to evaluate new estimators and decision rules.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods in Clinical Trials · Advanced Causal Inference Techniques · Distributed Sensor Networks and Detection Algorithms