A two-sample test based on averaged Wilcoxon rank sums over interpoint distances
Annika Betken, Aljosa Marjanovic, Katharina Proksch

TL;DR
This paper introduces a new two-sample test based on averaged Wilcoxon rank sums over interpoint distances, providing a robust, distribution-free method suitable for high-dimensional, low-sample-size data, with proven asymptotic properties and practical applications.
Contribution
It proposes a simple, asymptotically normal two-sample test based on interpoint distances, with proven consistency, variance approximation, and demonstrated effectiveness in high-dimensional, low-sample scenarios.
Findings
The test is asymptotically normal under null and alternative hypotheses.
It shows good finite sample performance in simulations.
Applied successfully to microarray genetic data.
Abstract
An important class of two-sample multivariate homogeneity tests is based on identifying differences between the distributions of interpoint distances. While generating distances from point clouds offers a straightforward and intuitive way for dimensionality reduction, it also introduces dependencies to the resulting distance samples. We propose a simple test based on Wilcoxon's rank sum statistic for which we prove asymptotic normality under the null hypothesis and fixed alternatives under mild conditions on the underlying distributions of the point clouds. Furthermore, we show consistency of the test and derive a variance approximation that allows to construct a computationally feasible, distribution-free test with good finite sample performance. The power and robustness of the test for high-dimensional data and low sample sizes is demonstrated by numerical simulations. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models
