Equivalence of distance-based and RKHS-based statistics in hypothesis testing
Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, Kenji Fukumizu

TL;DR
This paper establishes a unifying framework linking energy-based and RKHS-based statistics for hypothesis testing, showing their equivalence and exploring how different kernel choices affect test power.
Contribution
It demonstrates the equivalence between energy distances and MMD under certain conditions and introduces a parametric family of kernels that can improve test performance.
Findings
Energy distance is equivalent to MMD with a distance kernel.
A class of kernels can enhance test power over traditional energy distances.
The framework applies to two-sample and independence testing scenarios.
Abstract
We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with a semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Process Monitoring
