Data compression and learning in time sequences analysis

A. Puglisi; D. Benedetto; E. Caglioti; V. Loreto; and A. Vulpiani

arXiv:cond-mat/0207321·cond-mat.stat-mech·November 7, 2009

Data compression and learning in time sequences analysis

A. Puglisi, D. Benedetto, E. Caglioti, V. Loreto, and A. Vulpiani

PDF

TL;DR

This paper investigates how data compression algorithms learn and adapt to new sequences, revealing a universal scaling function that describes the learning process and its dependence on sequence similarity, with applications in sequence recognition and segmentation.

Contribution

It introduces a universal learning function for compression schemes and demonstrates its applicability across different systems and in recognizing dynamical systems from sequences.

Findings

01

Existence of a crossover length depending on relative entropy

02

Scaling function describes the learning process in compression

03

Application to dynamical system recognition and sequence segmentation

Abstract

Motivated by the problem of the definition of a distance between two sequences of characters, we investigate the so-called learning process of typical sequential data compression schemes. We focus on the problem of how a compression algorithm optimizes its features at the interface between two different sequences $A$ and $B$ while zipping the sequence $A + B$ obtained by simply appending $B$ after $A$ . We show the existence of a universal scaling function (the so-called learning function) which rules the way in which the compression algorithm learns a sequence $B$ after having compressed a sequence $A$ . In particular it turns out that it exists a crossover length for the sequence $B$ , which depends on the relative entropy between $A$ and $B$ , below which the compression algorithm does not learn the sequence $B$ (measuring in this way the relative entropy between $A$ and $B$ ) and above…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.