Unified Unbiased Variance Estimation for Maximum Mean Discrepancy: Robust Finite-Sample Performance with Imbalanced Data and Exact Acceleration under Null and Alternative Hypotheses

Shijie Zhong; Yikun Yang; Da Gong; Jiangfeng Fu

arXiv:2601.13874·stat.ML·February 5, 2026

Unified Unbiased Variance Estimation for Maximum Mean Discrepancy: Robust Finite-Sample Performance with Imbalanced Data and Exact Acceleration under Null and Alternative Hypotheses

Shijie Zhong, Yikun Yang, Da Gong, Jiangfeng Fu

PDF

Open Access

TL;DR

This paper develops a unified finite-sample variance estimator for the MMD statistic that is robust across hypotheses and data imbalances, and introduces an efficient acceleration method for the Laplacian kernel case.

Contribution

It provides a unified variance characterization for MMD under various conditions and proposes an exact acceleration algorithm reducing computational complexity.

Findings

01

Unified variance estimator valid under different hypotheses and data configurations.

02

Exact acceleration reduces computational complexity from O(n^2) to O(n log n).

03

Enhanced finite-sample performance in two-sample testing scenarios.

Abstract

The maximum mean discrepancy (MMD) is a kernel-based nonparametric statistic for two-sample testing, whose inferential accuracy depends critically on variance characterization. Existing work provides various finite-sample estimators of the MMD variance, often differing under the null and alternative hypotheses and across balanced or imbalanced sampling schemes. In this paper, we study the variance of the MMD statistic through its U-statistic representation and Hoeffding decomposition, and establish a unified finite-sample characterization covering different hypotheses and sample configurations. Building on this analysis, we propose an exact acceleration method for the univariate case under the Laplacian kernel, which reduces the overall computational complexity from $O (n^{2})$ to $O (n lo g n)$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Statistical Methods in Clinical Trials · Imbalanced Data Classification Techniques