Prediction by Compression

Joel Ratsaby

arXiv:1008.5078·cs.IT·August 31, 2010·1 cites

Prediction by Compression

Joel Ratsaby

PDF

Open Access

TL;DR

This paper explores how data compression techniques can be used to predict the next symbol in a text stream by analyzing the length of compressed data, offering a novel perspective on prediction via compression.

Contribution

It introduces a new criterion based on compression length to predict the next symbol and empirically investigates its prediction error rate and parameter dependencies.

Findings

01

Compression-based prediction error rate varies with compression parameters

02

The method provides a new approach to symbol prediction using black-box compressors

03

Empirical analysis demonstrates the viability of compression for prediction tasks

Abstract

It is well known that text compression can be achieved by predicting the next symbol in the stream of text data based on the history seen up to the current symbol. The better the prediction the more skewed the conditional probability distribution of the next symbol and the shorter the codeword that needs to be assigned to represent this next symbol. What about the opposite direction ? suppose we have a black box that can compress text stream. Can it be used to predict the next symbol in the stream ? We introduce a criterion based on the length of the compressed data and use it to predict the next symbol. We examine empirically the prediction error rate and its dependency on some compression parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · Computability, Logic, AI Algorithms