Fast Two-Sample Testing with Analytic Representations of Probability Measures
Kacper Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, Arthur, Gretton

TL;DR
This paper introduces fast, nonparametric two-sample tests using analytic function representations of distributions, offering improved speed and power over existing methods, especially in high-dimensional settings.
Contribution
The paper presents two novel linear-time two-sample tests based on analytic functions, outperforming previous linear and quadratic-time methods in speed and detection power.
Findings
Tests are consistent against a broad class of alternatives.
Experiments show better power/time tradeoff than competing methods.
Performance remains strong in high-dimensional and complex scenarios.
Abstract
We propose a class of nonparametric two-sample tests with a cost linear in the sample size. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. The first test uses smoothed empirical characteristic functions to represent the distributions, the second uses distribution embeddings in a reproducing kernel Hilbert space. Analyticity implies that differences in the distributions may be detected almost surely at a finite number of randomly chosen locations/frequencies. The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distance-based tests. Experiments on artificial benchmarks and on challenging real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Statistical Methods and Inference · Algorithms and Data Compression
