A Vector Representation for Phylogenetic Trees
Cedric Chauve, Caroline Colijn, Louxin Zhang

TL;DR
This paper introduces a new vector-based representation for rooted phylogenetic trees, along with a novel rearrangement operator and metric, enabling efficient computation and better correlation with existing distances.
Contribution
It presents a new vector encoding for phylogenetic trees, a novel rearrangement operator called HOP, and a tractable distance metric with improved correlation to existing measures.
Findings
HOP distance is computable in near-linear time.
HOP distance correlates better with Subtree-Prune-and-Regraft distance than Robinson-Foulds.
The representation can be extended to tree-child networks.
Abstract
Good representations for phylogenetic trees and networks are important for optimizing storage efficiency and implementation of scalable methods for the inference and analysis of evolutionary trees for genes, genomes and species. We introduce a new representation for rooted phylogenetic trees that encodes a binary tree on n taxa as a vector of length 2n in which each taxon appears exactly twice. Using this new tree representation, we introduce a novel tree rearrangement operator, called a HOP, that results in a tree space of diameter n and a quadratic neighbourhood size. We also introduce a novel metric, the HOP distance, which is the minimum number of HOPs to transform a tree into another tree. The HOP distance can be computed in near-linear time, a rare instance of a tree rearrangement distance that is tractable. Our experiments show that the HOP distance is better correlated to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Machine Learning in Bioinformatics
