EvoZip: Efficient Compression of Large Collections of Evolutionary Trees
Balanand Jha, David Fern\'andez-Baca, Akshay Deepak, Kumar Abhishek

TL;DR
EvoZip is a new phylogenetic tree compression method that significantly outperforms the current state-of-the-art in compression ratio and speed, enabling more efficient storage and retrieval of large tree collections.
Contribution
EvoZip introduces novel encoding schemes and compression techniques that improve upon TreeZip for phylogenetic tree compression.
Findings
71.6% better compression on average
80.71% less compression time
60.47% less decompression time
Abstract
Phylogenetic trees represent evolutionary relationships among sets of organisms. Popular phylogenetic reconstruction approaches typically yield hundreds to thousands of trees on a common leafset. Storing and sharing such large collection of trees requires considerable amount of space and bandwidth. Furthermore, the huge size of phylogenetic tree databases can make search and retrieval operations time-consuming. Phylogenetic compression techniques are specialized compression techniques that exploit redundant topological information to achieve better compression of phylogenetic trees. Here, we present EvoZip, a new approach for phylogenetic tree compression. On average, EvoZip achieves 71.6% better compression and takes 80.71% less compression time and 60.47% less decompression time than TreeZip, the current state-of-the-art algorithm for phylogenetic tree compression. While EvoZip is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Algorithms and Data Compression
