Ranking by Lifts: A Cost-Benefit Approach to Large-Scale A/B Tests

Pallavi Basu; Ron Berman

arXiv:2407.01036·stat.ME·August 21, 2025

Ranking by Lifts: A Cost-Benefit Approach to Large-Scale A/B Tests

Pallavi Basu, Ron Berman

PDF

Open Access

TL;DR

This paper introduces a decision-theoretic, cost-aware ranking method for large-scale A/B testing that maximizes profit while controlling false discovery costs, demonstrating improved performance over existing methods.

Contribution

It develops a novel empirical Bayes framework with a greedy knapsack algorithm for optimal experiment ranking based on lift-to-cost ratio and false discovery control.

Findings

01

The proposed method is rank-optimal and valid.

02

It outperforms existing FDR-controlling methods in finite samples.

03

Application to Optimizely data shows significant business value.

Abstract

A/B testing is a core tool for decision-making in business experimentation, particularly in digital platforms and marketplaces. Practitioners often prioritize lift in performance metrics while seeking to control the costs of false discoveries. This paper develops a decision-theoretic framework for maximizing expected profit subject to a constraint on the cost-weighted false discovery rate (FDR). We propose an empirical Bayes approach that uses a greedy knapsack algorithm to rank experiments based on the ratio of expected lift to cost, incorporating the local false discovery rate (lfdr) as a key statistic. The resulting oracle rule is valid and rank-optimal. In large-scale settings, we establish the asymptotic validity of a data-driven implementation and demonstrate superior finite-sample performance over existing FDR-controlling methods. An application to A/B tests run on the Optimizely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInfrastructure Maintenance and Monitoring · Advanced Statistical Process Monitoring