Computationally efficient permutation tests for the multivariate   two-sample problem based on energy distance or maximum mean discrepancy   statistics

Elias Chaibub Neto

arXiv:2406.06488·stat.CO·June 11, 2024

Computationally efficient permutation tests for the multivariate two-sample problem based on energy distance or maximum mean discrepancy statistics

Elias Chaibub Neto

PDF

Open Access

TL;DR

This paper introduces a new permutation testing algorithm for multivariate two-sample tests based on energy distance or MMD, significantly reducing computation time while maintaining statistical validity and power.

Contribution

The authors propose a novel permutation algorithm that pre-computes smaller matrices, achieving large speedups without losing statistical power or validity.

Findings

01

Significant computational speedups demonstrated in experiments

02

Maintains finite-sample validity of tests

03

Comparable statistical power to existing methods

Abstract

Non-parametric two-sample tests based on energy distance or maximum mean discrepancy are widely used statistical tests for comparing multivariate data from two populations. While these tests enjoy desirable statistical properties, their test statistics can be expensive to compute as they require the computation of 3 distinct Euclidean distance (or kernel) matrices between samples, where the time complexity of each of these computations (namely, $O (n_{x}^{2} p)$ , $O (n_{y}^{2} p)$ , and $O (n_{x} n_{y} p)$ ) scales quadratically with the number of samples ( $n_{x}$ , $n_{y}$ ) and linearly with the number of variables ( $p$ ). Since the standard permutation test requires repeated re-computations of these expensive statistics it's application to large datasets can become unfeasible. While several statistical approaches have been proposed to mitigate this issue, they all sacrifice desirable statistical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Distribution Estimation and Applications · Bayesian Methods and Mixture Models · Advanced Statistical Methods and Models