# Vector Encoding of Phylogenetic Trees by Ordered Leaf Attachment

**Authors:** David Harry Richman, Cheng Zhang, Frederick A. Matsen

PMC · DOI: 10.1007/s11538-026-01611-9 · 2026-03-18

## TL;DR

This paper introduces a new method to convert phylogenetic trees into integer vectors, enabling efficient tree comparison and analysis using machine learning.

## Contribution

The novel OLA method uniquely encodes tree topologies as integer vectors with linear-time encoding/decoding and a simple structure.

## Key findings

- OLA encoding and decoding run in linear time relative to the number of leaf nodes.
- The OLA encoding defines a tree distance that is compared to NNI and SPR distances.
- The set of OLA vectors is a simply-described subset of integer sequences.

## Abstract

As part of work to connect phylogenetics with machine learning, there has been considerable recent interest in vector encodings of phylogenetic trees. We present a simple new “ordered leaf attachment” (OLA) method for uniquely encoding a binary, rooted phylogenetic tree topology as an integer vector. OLA encoding and decoding take linear time in the number of leaf nodes, and the set of vectors corresponding to trees is a simply-described subset of integer sequences. The OLA encoding is unique compared to other existing encodings in having these properties. The integer vector encoding induces a distance on the set of trees, and we investigate this distance in relation to the NNI and SPR distances.

## Full-text entities

- **Genes:** HOPX (HOP homeobox) [NCBI Gene 84525] {aka CAMEO, HOD, HOP, LAGY, NECC1, OB1}, SPR (sepiapterin reductase) [NCBI Gene 6697] {aka SDR38C1}
- **Diseases:** OLA (MESH:D019962), Leaf Shuffling (MESH:D020233)
- **Chemicals:** OLA (-), T (MESH:D014316)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12999618/full.md

---
Source: https://tomesphere.com/paper/PMC12999618