Unsupervised Learning of Phylogenetic Trees via Split-Weight Embedding
Yibo Kong, George P. Tiley, Claudia Solis-Lemus

TL;DR
This paper introduces a novel split-weight embedding method that enables unsupervised clustering of phylogenetic trees, effectively revealing evolutionary relationships in both simulated and real data.
Contribution
The paper presents a new split-weight embedding technique that facilitates the application of standard clustering algorithms to phylogenetic trees, a previously challenging task.
Findings
Successfully recovers evolutionary relationships in simulated data
Effectively clusters real Adansonia baobabs data
Demonstrates the utility of unsupervised learning in phylogenetics
Abstract
Unsupervised learning has become a staple in classical machine learning, successfully identifying clustering patterns in data across a broad range of domain applications. Surprisingly, despite its accuracy and elegant simplicity, unsupervised learning has not been sufficiently exploited in the realm of phylogenetic tree inference. The main reason for the delay in adoption of unsupervised learning in phylogenetics is the lack of a meaningful, yet simple, way of embedding phylogenetic trees into a vector space. Here, we propose the simple yet powerful split-weight embedding which allows us to fit standard clustering algorithms to the space of phylogenetic trees. We show that our split-weight embedded clustering is able to recover meaningful evolutionary relationships in simulated and real (Adansonia baobabs) data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Evolution and Paleontology Studies · Species Distribution and Climate Change
