Unified Unbiased Variance Estimation for Maximum Mean Discrepancy: Robust Finite-Sample Performance with Imbalanced Data and Exact Acceleration under Null and Alternative Hypotheses
Shijie Zhong, Yikun Yang, Da Gong, Jiangfeng Fu

TL;DR
This paper develops a unified finite-sample variance estimator for the MMD statistic that is robust across hypotheses and data imbalances, and introduces an efficient acceleration method for the Laplacian kernel case.
Contribution
It provides a unified variance characterization for MMD under various conditions and proposes an exact acceleration algorithm reducing computational complexity.
Findings
Unified variance estimator valid under different hypotheses and data configurations.
Exact acceleration reduces computational complexity from O(n^2) to O(n log n).
Enhanced finite-sample performance in two-sample testing scenarios.
Abstract
The maximum mean discrepancy (MMD) is a kernel-based nonparametric statistic for two-sample testing, whose inferential accuracy depends critically on variance characterization. Existing work provides various finite-sample estimators of the MMD variance, often differing under the null and alternative hypotheses and across balanced or imbalanced sampling schemes. In this paper, we study the variance of the MMD statistic through its U-statistic representation and Hoeffding decomposition, and establish a unified finite-sample characterization covering different hypotheses and sample configurations. Building on this analysis, we propose an exact acceleration method for the univariate case under the Laplacian kernel, which reduces the overall computational complexity from to .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods in Clinical Trials · Imbalanced Data Classification Techniques
