Vector Encoding of Phylogenetic Trees by Ordered Leaf Attachment
David Harry Richman, Cheng Zhang, Frederick A. Matsen

TL;DR
This paper introduces a new method to convert phylogenetic trees into integer vectors, enabling efficient tree comparison and analysis using machine learning.
Contribution
The novel OLA method uniquely encodes tree topologies as integer vectors with linear-time encoding/decoding and a simple structure.
Findings
OLA encoding and decoding run in linear time relative to the number of leaf nodes.
The OLA encoding defines a tree distance that is compared to NNI and SPR distances.
The set of OLA vectors is a simply-described subset of integer sequences.
Abstract
As part of work to connect phylogenetics with machine learning, there has been considerable recent interest in vector encodings of phylogenetic trees. We present a simple new “ordered leaf attachment” (OLA) method for uniquely encoding a binary, rooted phylogenetic tree topology as an integer vector. OLA encoding and decoding take linear time in the number of leaf nodes, and the set of vectors corresponding to trees is a simply-described subset of integer sequences. The OLA encoding is unique compared to other existing encodings in having these properties. The integer vector encoding induces a distance on the set of trees, and we investigate this distance in relation to the NNI and SPR distances.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 1
Figure 20
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genome Rearrangement Algorithms · Fractal and DNA sequence analysis
