Using Data Compressors to Construct Rank Tests
Daniil Ryabko, Juergen Schmidhuber

TL;DR
This paper introduces a novel nonparametric rank testing method using data compressors, which can effectively test for homogeneity and component independence by analyzing the compressibility of ordered data sequences.
Contribution
It proposes a new approach to nonparametric testing based on data compression techniques, extending the applicability of compressor-based methods to homogeneity and independence testing.
Findings
The compressor-based test is valid against all alternatives when using an ideal compressor.
The method successfully reduces component independence testing to homogeneity testing.
The approach is straightforward to extend to multiple samples.
Abstract
Nonparametric rank tests for homogeneity and component independence are proposed, which are based on data compressors. For homogeneity testing the idea is to compress the binary string obtained by ordering the two joint samples and writing 0 if the element is from the first sample and 1 if it is from the second sample and breaking ties by randomization (extension to the case of multiple samples is straightforward). should be rejected if the string is compressed (to a certain degree) and accepted otherwise. We show that such a test obtained from an ideal data compressor is valid against all alternatives. Component independence is reduced to homogeneity testing by constructing two samples, one of which is the first half of the original and the other is the second half with one of the components randomly permuted.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Computability, Logic, AI Algorithms · Fractal and DNA sequence analysis
