Universal Codes as a Basis for Time Series Testing
Boris Ryabko, Jaakko Astola

TL;DR
This paper introduces a universal data compression-based approach for hypothesis testing in stationary ergodic processes, enabling nonparametric tests for various properties without prior distribution knowledge.
Contribution
It proposes a novel universal coding-based framework for hypothesis testing applicable to multiple problems in time series analysis, even with unknown codeword length distributions.
Findings
Applicable with standard archivers for practical testing
Enables nonparametric tests for goodness-of-fit, independence, and homogeneity
Works without prior knowledge of distribution laws
Abstract
We suggest a new approach to hypothesis testing for ergodic and stationary processes. In contrast to standard methods, the suggested approach gives a possibility to make tests, based on any lossless data compression method even if the distribution law of the codeword lengths is not known. We apply this approach to the following four problems: goodness-of-fit testing (or identity testing), testing for independence, testing of serial independence and homogeneity testing and suggest nonparametric statistical tests for these problems. It is important to note that practically used so-called archivers can be used for suggested testing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Computability, Logic, AI Algorithms · Fractal and DNA sequence analysis
