On Finite Memory Universal Data Compression and Classification of Individual Sequences
Jacob Ziv

TL;DR
This paper explores the limits of finite-memory universal data compression and classification for individual sequences, demonstrating that context tree coding and a specific classifier are essentially optimal for these tasks.
Contribution
It shows that context tree coding nearly achieves the best possible universal compression for finite blocks and introduces an optimal universal classifier with linear storage complexity.
Findings
Context tree coding nearly achieves optimal universal compression.
A universal context classifier with linear storage is essentially optimal.
The results support the theoretical basis of PST in learning and biology.
Abstract
Consider the case where consecutive blocks of N letters of a semi-infinite individual sequence X over a finite-alphabet are being compressed into binary sequences by some one-to-one mapping. No a-priori information about X is available at the encoder, which must therefore adopt a universal data-compression algorithm. It is known that if the universal LZ77 data compression algorithm is successively applied to N-blocks then the best error-free compression for the particular individual sequence X is achieved, as tends to infinity. The best possible compression that may be achieved by any universal data compression algorithm for finite N-blocks is discussed. It is demonstrated that context tree coding essentially achieves it. Next, consider a device called classifier (or discriminator) that observes an individual training sequence X. The classifier's task is to examine individual test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Computability, Logic, AI Algorithms · Machine Learning and Algorithms
