A Decision Theoretic Approach to A/B Testing
David Goldberg, James E. Johndrow

TL;DR
This paper introduces a decision-theoretic framework for A/B testing that automates threshold selection for feature adoption, potentially improving over traditional fixed p-value thresholds by optimizing Bayes risk.
Contribution
It proposes a novel decision-theoretic approach to determine A/B testing thresholds, moving beyond arbitrary p-value cutoffs and enabling automated, optimal decision-making.
Findings
The 0.05 p-value threshold may be overly conservative in some cases.
The proposed method can adapt thresholds based on data and loss functions.
Traditional thresholds may serve as ad-hoc solutions for multiple testing issues.
Abstract
A/B testing is ubiquitous within the machine learning and data science operations of internet companies. Generically, the idea is to perform a statistical test of the hypothesis that a new feature is better than the existing platform---for example, it results in higher revenue. If the p value for the test is below some pre-defined threshold---often, 0.05---the new feature is implemented. The difficulty of choosing an appropriate threshold has been noted before, particularly because dependent tests are often done sequentially, leading some to propose control of the false discovery rate (FDR) rather than use of a single, universal threshold. However, it is still necessary to make an arbitrary choice of the level at which to control FDR. Here we suggest a decision-theoretic approach to determining whether to adopt a new feature, which enables automated selection of an appropriate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Machine Learning and Data Classification · Statistical Methods and Inference
