Algorithmic statistics revisited
Nikolay Vereshchagin, Alexander Shen

TL;DR
This paper revisits algorithmic statistics by exploring the concept of stochasticity profiles, which balance model complexity and data adequacy using notions from algorithmic information theory.
Contribution
It provides a comprehensive survey of multiple equivalent definitions of stochasticity profiles and their interrelations in the context of algorithmic statistics.
Findings
Stochasticity profiles can be characterized in four equivalent ways.
The survey links randomness deficiency, description length, string lists, and Kolmogorov complexity.
The paper clarifies the theoretical foundations of model adequacy in algorithmic statistics.
Abstract
The mission of statistics is to provide adequate statistical hypotheses (models) for observed data. But what is an "adequate" model? To answer this question, one needs to use the notions of algorithmic information theory. It turns out that for every data string one can naturally define "stochasticity profile", a curve that represents a trade-off between complexity of a model and its adequacy. This curve has four different equivalent definitions in terms of (1)~randomness deficiency, (2)~minimal description length, (3)~position in the lists of simple strings and (4)~Kolmogorov complexity with decompression time bounded by busy beaver function. We present a survey of the corresponding definitions and results relating them to each other.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Algorithms and Data Compression · Benford’s Law and Fraud Detection
