Algorithms for Estimating Information Distance with Application to Bioinformatics and Linguistics
Alexei Kaltchenko

TL;DR
This paper explores methods to estimate information distance using data compression and relative entropy, highlighting their applications in bioinformatics and linguistics, and finding the latter more relevant in these fields.
Contribution
It introduces compression-based algorithms for approximating information distances and compares their effectiveness to Kolmogorov complexity-based methods in bioinformatics and linguistics.
Findings
Relative entropy-based distance is more relevant in bioinformatics and linguistics.
Compression algorithms can effectively approximate normalized information distances.
Alternative distance measures outperform Kolmogorov complexity-based measures in practical applications.
Abstract
After reviewing unnormalized and normalized information distances based on incomputable notions of Kolmogorov complexity, we discuss how Kolmogorov complexity can be approximated by data compression algorithms. We argue that optimal algorithms for data compression with side information can be successfully used to approximate the normalized distance. Next, we discuss an alternative information distance, which is based on relative entropy rate (also known as Kullback-Leibler divergence), and compression-based algorithms for its estimation. Based on available biological and linguistic data, we arrive to unexpected conclusion that in Bioinformatics and Computational Linguistics this alternative distance is more relevant and important than the ones based on Kolmogorov complexity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Fractal and DNA sequence analysis · Algorithms and Data Compression
