Surrogate-Based Prevalence Measurement for Large-Scale A/B Testing

Zehao Xu; Tony Paek; Kevin O'Sullivan; Attila Dobi

arXiv:2602.16111·stat.AP·February 19, 2026

Surrogate-Based Prevalence Measurement for Large-Scale A/B Testing

Zehao Xu, Tony Paek, Kevin O'Sullivan, Attila Dobi

PDF

Open Access

TL;DR

This paper introduces a scalable surrogate-based framework for measuring content prevalence in large-scale A/B testing, reducing costs and latency by decoupling expensive labeling from the evaluation process.

Contribution

The paper presents a novel surrogate-based approach that calibrates offline labels and uses impression logs for fast prevalence estimation, enabling scalable A/B testing measurement.

Findings

01

Surrogate estimates closely match reference labels in large-scale tests.

02

Framework reduces labeling costs and improves measurement speed.

03

Effective for both arm-level prevalence and treatment-control differences.

Abstract

Online media platforms often need to measure how frequently users are exposed to specific content attributes in order to evaluate trade-offs in A/B experiments. A direct approach is to sample content, label it using a high-quality rubric (e.g., an expert-reviewed LLM prompt), and estimate impression-weighted prevalence. However, repeatedly running such labeling for every experiment arm and segment is too costly and slow to serve as a default measurement at scale. We present a scalable \emph{surrogate-based prevalence measurement} framework that decouples expensive labeling from per-experiment evaluation. The framework calibrates a surrogate signal to reference labels offline and then uses only impression logs to estimate prevalence for arbitrary experiment arms and segments. We instantiate this framework using \emph{score bucketing} as the surrogate: we discretize a model score into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Mobile Crowdsensing and Crowdsourcing · Hate Speech and Cyberbullying Detection