Compression, Generalization and Learning
Marco C. Campi, Simone Garatti

TL;DR
This paper develops a new theoretical framework for understanding how compression functions relate to learning, providing bounds on the probability of change and enabling hyper-parameter tuning without prior distribution knowledge.
Contribution
It introduces a novel theory linking compression change probability to set cardinality, with tight finite-sample bounds applicable in agnostic learning scenarios.
Findings
Cardinality of compressed set estimates change probability accurately.
Finite-sample bounds for change probability are derived.
Results are applicable without prior distribution assumptions.
Abstract
A compression function is a map that slims down an observational set into a subset of reduced size, while preserving its informational content. In multiple applications, the condition that one new observation makes the compressed set change is interpreted that this observation brings in extra information and, in learning theory, this corresponds to misclassification, or misprediction. In this paper, we lay the foundations of a new theory that allows one to keep control on the probability of change of compression (which maps into the statistical "risk" in learning applications). Under suitable conditions, the cardinality of the compressed set is shown to be a consistent estimator of the probability of change of compression (without any upper limit on the size of the compressed set); moreover, unprecedentedly tight finite-sample bounds to evaluate the probability of change of compression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Computability, Logic, AI Algorithms · Algorithms and Data Compression
