phylo2vec: a library for vector-based phylogenetic tree manipulation
Neil Scheidwasser, Ayush Nag, Matthew J Penn, Anthony MV Jakob, Frederik M{\o}lkj{\ae}r Andersen, Mark P Khurana, Landung Setiawan, David A Duch\^ene, and Samir Bhatt

TL;DR
phylo2vec is a high-performance software library that efficiently encodes, manipulates, and analyzes binary phylogenetic trees using vector-based representations, supporting large datasets in biology and linguistics.
Contribution
the paper introduces phylo2vec, a new library that improves phylogenetic tree handling with a vector-based approach, implemented in Rust and accessible via R and Python.
Findings
significantly faster tree manipulation and comparison
more memory-efficient representation of phylogenetic trees
broad accessibility through multi-language wrappers
Abstract
Phylogenetics is a fundamental component of evolutionary analysis frameworks in biology and linguistics. Recently, the advent of large-scale genomics and the SARS-CoV-2 pandemic has highlighted the necessity for phylogenetic software to handle large datasets. While significant efforts have focused on scaling optimisation algorithms, visualization, and lineage identification, an emerging body of research has been dedicated to efficient representations of data for genomes and phylogenetic trees. Compared to the traditional Newick format which represents trees using strings of nested parentheses, modern tree representations utilize integer vectors to define the tree topology traversal. This approach offers several advantages, including easier manipulation, increased memory efficiency, and applicability to machine learning. Here, we present the latest release of phylo2vec (or Phylo2Vec),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Data Mining Algorithms and Applications
