Surrogate-Based Prevalence Measurement for Large-Scale A/B Testing
Zehao Xu, Tony Paek, Kevin O'Sullivan, Attila Dobi

TL;DR
This paper introduces a scalable surrogate-based framework for measuring content prevalence in large-scale A/B testing, reducing costs and latency by decoupling expensive labeling from the evaluation process.
Contribution
The paper presents a novel surrogate-based approach that calibrates offline labels and uses impression logs for fast prevalence estimation, enabling scalable A/B testing measurement.
Findings
Surrogate estimates closely match reference labels in large-scale tests.
Framework reduces labeling costs and improves measurement speed.
Effective for both arm-level prevalence and treatment-control differences.
Abstract
Online media platforms often need to measure how frequently users are exposed to specific content attributes in order to evaluate trade-offs in A/B experiments. A direct approach is to sample content, label it using a high-quality rubric (e.g., an expert-reviewed LLM prompt), and estimate impression-weighted prevalence. However, repeatedly running such labeling for every experiment arm and segment is too costly and slow to serve as a default measurement at scale. We present a scalable \emph{surrogate-based prevalence measurement} framework that decouples expensive labeling from per-experiment evaluation. The framework calibrates a surrogate signal to reference labels offline and then uses only impression logs to estimate prevalence for arbitrary experiment arms and segments. We instantiate this framework using \emph{score bucketing} as the surrogate: we discretize a model score into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Mobile Crowdsensing and Crowdsourcing · Hate Speech and Cyberbullying Detection
