A Permutation-free Kernel Two-Sample Test
Shubhanshu Shekhar, Ilmun Kim, Aaditya Ramdas

TL;DR
This paper introduces the cross-MMD, a permutation-free, quadratic-time kernel two-sample test that is computationally efficient, statistically valid, and maintains high power, especially for large datasets.
Contribution
The paper proposes the cross-MMD, a novel permutation-free test statistic based on sample-splitting and studentization, with proven asymptotic normality and optimal power properties.
Findings
Cross-MMD has a limiting Gaussian distribution under the null hypothesis.
The test is consistent against any fixed alternative.
It offers significant computational speedup with minimal power loss for large samples.
Abstract
The kernel Maximum Mean Discrepancy~(MMD) is a popular multivariate distance metric between distributions that has found utility in two-sample testing. The usual kernel-MMD test statistic is a degenerate U-statistic under the null, and thus it has an intractable limiting distribution. Hence, to design a level- test, one usually selects the rejection threshold as the -quantile of the permutation distribution. The resulting nonparametric test has finite-sample validity but suffers from large computational cost, since every permutation takes quadratic time. We propose the cross-MMD, a new quadratic-time MMD test statistic based on sample-splitting and studentization. We prove that under mild assumptions, the cross-MMD has a limiting standard Gaussian distribution under the null. Importantly, we also show that the resulting test is consistent against any fixed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Advanced Statistical Methods and Models
MethodsTest
