Consistent distribution-free $K$-sample and independence tests for univariate random variables
Ruth Heller, Yair Heller, Shachar Kaufman, Barak Brill, Malka Gorfine

TL;DR
This paper introduces novel, consistent distribution-free tests for univariate independence and $K$-sample problems, utilizing aggregation over partitions of fixed size, with proven efficiency and strong power in simulations and real data.
Contribution
The paper proposes new aggregation-based independence tests that are distribution-free, consistent, and computationally efficient, improving power over existing methods.
Findings
Tests are nearly as powerful as optimal partition-based tests.
Proposed algorithms run in polynomial time.
Regularized tests outperform existing methods in simulations.
Abstract
A popular approach for testing if two univariate random variables are statistically independent consists of partitioning the sample space into bins, and evaluating a test statistic on the binned data. The partition size matters, and the optimal partition size is data dependent. While for detecting simple relationships coarse partitions may be best, for detecting complex relationships a great gain in power can be achieved by considering finer partitions. We suggest novel consistent distribution-free tests that are based on summation or maximization aggregation of scores over all partitions of a fixed size. We show that our test statistics based on summation can serve as good estimators of the mutual information. Moreover, we suggest regularized tests that aggregate over all partition sizes, and prove those are consistent too. We provide polynomial-time algorithms, which are critical for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Statistical Methods in Clinical Trials
