A conditional compression distance that unveils insights of the genomic evolution
Diogo Pratas, Armando J. Pinho

TL;DR
This paper introduces a new genomic distance measure based on conditional compression, revealing novel evolutionary insights by comparing primate and rodent genomes using a specially designed compressor.
Contribution
It proposes the Normalized Conditional Compression Distance (NCCD), a novel metric utilizing conditional information content for genomic sequence comparison.
Findings
Measured chromosomal distances among primates and rodents
Uncovered new insights into evolutionary relationships
Demonstrated effectiveness of the conditional compression approach
Abstract
We describe a compression-based distance for genomic sequences. Instead of using the usual conjoint information content, as in the classical Normalized Compression Distance (NCD), it uses the conditional information content. To compute this Normalized Conditional Compression Distance (NCCD), we need a normal conditional compressor, that we built using a mixture of static and dynamic finite-context models. Using this approach, we measured chromosomal distances between Hominidae primates and also between Muroidea (rat and mouse), observing several insights of evolution that so far have not been reported in the literature.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Fractal and DNA sequence analysis · Algorithms and Data Compression
