Information Distance in Multiples

Paul M.B. Vitanyi

arXiv:0905.3347·cs.CV·May 21, 2009·1 cites

Information Distance in Multiples

Paul M.B. Vitanyi

PDF

Open Access

TL;DR

This paper extends the concept of information distance from pairs to multiples, analyzing its properties and practical approximation using compression algorithms for applications in pattern recognition and data mining.

Contribution

It introduces a theoretical framework for information distance in multiples and explores its properties, providing practical methods for approximation with real-world compression tools.

Findings

01

Analysis of maximal overlap and metricity in multiples

02

Demonstration of universality and minimal overlap properties

03

Validation of approximation methods using compression algorithms

Abstract

Information distance is a parameter-free similarity measure based on compression, used in pattern recognition, data mining, phylogeny, clustering, and classification. The notion of information distance is extended from pairs to multiples (finite lists). We study maximal overlap, metricity, universality, minimal overlap, additivity, and normalized information distance in multiples. We use the theoretical notion of Kolmogorov complexity which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program. {\em Index Terms}-- Information distance, multiples, pattern recognition, data mining, similarity, Kolmogorov complexity

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms · Fractal and DNA sequence analysis · Machine Learning and Algorithms