Evaluating Variance Estimates with Relative Efficiency

Kedar Karhadkar; Jack Klys; Daniel Ting; Artem Vorozhtsov; Houssam Nassif

arXiv:2511.15961·stat.ME·November 21, 2025

Evaluating Variance Estimates with Relative Efficiency

Kedar Karhadkar, Jack Klys, Daniel Ting, Artem Vorozhtsov, Houssam Nassif

PDF

Open Access

TL;DR

This paper introduces a method to evaluate the accuracy of variance estimates in experimentation platforms, proposing a more efficient $t^2$-statistic for detecting issues with confidence intervals.

Contribution

It presents a novel empirical approach to assess variance estimate effectiveness and introduces a $t^2$-statistic that improves detection efficiency over traditional FPR-based methods.

Findings

01

The $t^2$-statistic outperforms empirical FPR in detecting variance issues.

02

A/A testing can be enhanced by using more informative statistics.

03

The proposed method improves reliability diagnostics for experimentation platforms.

Abstract

Experimentation platforms in industry must often deal with customer trust issues. Platforms must prove the validity of their claims as well as catch issues that arise. As a central quantity estimated by experimentation platforms, the validity of confidence intervals is of particular concern. To ensure confidence intervals are reliable, we must understand and diagnose when our variance estimates are biased or noisy, or when the confidence intervals may be incorrect. A common method for this is A/A testing, in which both the control and test arms receive the same treatment. One can then test if the empirical false positive rate (FPR) deviates substantially from the target FPR over many tests. However, this approach turns each A/A test into a simple binary random variable. It is an inefficient estimate of the FPR as it throws away information about the magnitude of each experiment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Adversarial Robustness in Machine Learning · Imbalanced Data Classification Techniques