Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing
Aaditya Ramdas, Sashank J. Reddi, Barnabas Poczos, Aarti Singh, Larry, Wasserman

TL;DR
This paper analyzes the power and efficiency of kernel and distance-based two-sample tests in high-dimensional settings, revealing their asymptotic optimality and tradeoffs between computational complexity and statistical power.
Contribution
It provides a formal characterization of the power of popular GDA tests like gMMD and eED in high dimensions, highlighting their optimality and computational tradeoffs.
Findings
gMMD and eED have asymptotically equal power in high dimensions.
These tests are also consistent for mean difference alternatives, matching specialized tests.
There is a clear tradeoff between computational complexity and statistical power.
Abstract
Nonparametric two sample testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. We refer to the most common settings as mean difference alternatives (MDA), for testing differences only in first moments, and general difference alternatives (GDA), which is about testing for any difference in distributions. A large number of test statistics have been proposed for both these settings. This paper connects three classes of statistics - high dimensional variants of Hotelling's t-test, statistics based on Reproducing Kernel Hilbert Spaces, and energy statistics based on pairwise distances. We ask the question: how much statistical power do popular kernel and distance based tests for GDA have when the unknown distributions differ in their means, compared to specialized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Distributed Sensor Networks and Detection Algorithms
