Data compression and learning in time sequences analysis
A. Puglisi, D. Benedetto, E. Caglioti, V. Loreto, and A. Vulpiani

TL;DR
This paper investigates how data compression algorithms learn and adapt to new sequences, revealing a universal scaling function that describes the learning process and its dependence on sequence similarity, with applications in sequence recognition and segmentation.
Contribution
It introduces a universal learning function for compression schemes and demonstrates its applicability across different systems and in recognizing dynamical systems from sequences.
Findings
Existence of a crossover length depending on relative entropy
Scaling function describes the learning process in compression
Application to dynamical system recognition and sequence segmentation
Abstract
Motivated by the problem of the definition of a distance between two sequences of characters, we investigate the so-called learning process of typical sequential data compression schemes. We focus on the problem of how a compression algorithm optimizes its features at the interface between two different sequences and while zipping the sequence obtained by simply appending after . We show the existence of a universal scaling function (the so-called learning function) which rules the way in which the compression algorithm learns a sequence after having compressed a sequence . In particular it turns out that it exists a crossover length for the sequence , which depends on the relative entropy between and , below which the compression algorithm does not learn the sequence (measuring in this way the relative entropy between and ) and above…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
