Accelerated Computation of a High Dimensional Kolmogorov-Smirnov Distance
Alex Hagen, Shane Jackson, James Kahn, Jan Strube, Isabel Haide, Karl, Pazdernik, Connor Hainje

TL;DR
This paper introduces a high-dimensional extension of the Kolmogorov-Smirnov test, called ddKS, with analytical significance calculation and fast algorithms, demonstrating superior performance over existing tests in various datasets.
Contribution
The paper presents the ddKS test with an analytical significance formula and efficient algorithms, including parallel and approximate methods, for high-dimensional data analysis.
Findings
ddKS performs well across all tested datasets and dimensions.
The algorithms significantly reduce computation time for high-dimensional data.
ddKS outperforms Hotelling's T^2 and Kullback-Leibler divergence in power analysis.
Abstract
Statistical testing is widespread and critical for a variety of scientific disciplines. The advent of machine learning and the increase of computing power has increased the interest in the analysis and statistical testing of multidimensional data. We extend the powerful Kolmogorov-Smirnov two sample test to a high dimensional form in a similar manner to Fasano (Fasano, 1987). We call our result the d-dimensional Kolmogorov-Smirnov test (ddKS) and provide three novel contributions therewith: we develop an analytical equation for the significance of a given ddKS score, we provide an algorithm for computation of ddKS on modern computing hardware that is of constant time complexity for small sample sizes and dimensions, and we provide two approximate calculations of ddKS: one that reduces the time complexity to linear at larger sample sizes, and another that reduces the time complexity to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Computability, Logic, AI Algorithms · Machine Learning and Algorithms
