A Scalable Nystrom-Based Kernel Two-Sample Test with Permutations
Antoine Chatalic, Marco Letizia, Nicolas Schreuder, Lorenzo Rosasco

TL;DR
This paper introduces a scalable kernel two-sample test using Nyström approximation of MMD, achieving computational efficiency while maintaining statistical guarantees, suitable for large-scale scientific data.
Contribution
It proposes a Nyström-based kernel two-sample test with finite-sample guarantees and optimal separation rates, addressing computational challenges in large datasets.
Findings
Finite-sample bound on test power for well-separated distributions
Separation rate matches minimax optimal rate
Numerical experiments demonstrate applicability to real scientific data
Abstract
Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing, maximum mean discrepancy (MMD) has gained popularity as a test statistic due to its flexibility and strong theoretical foundations. However, its use in large-scale scenarios is plagued by high computational costs. In this work, we use a Nystr\"om approximation of the MMD to design a computationally efficient and practical testing algorithm while preserving statistical guarantees. Our main result is a finite-sample bound on the power of the proposed test for distributions that are sufficiently separated with respect to the MMD. The derived separation rate matches the known minimax optimal rate in this setting. We support our findings with a series of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
