Building alternative consensus trees and supertrees using k-means and Robinson and Foulds distance
Nadia Tahiri, Bernard Fichet, Vladimir Makarenkov

TL;DR
This paper introduces a novel, efficient k-means based method utilizing Robinson and Foulds distance to infer multiple alternative consensus trees and supertrees, effectively capturing diverse evolutionary patterns in large gene datasets.
Contribution
It presents a new clustering approach for phylogenetic trees that improves speed and handles heterogeneous data, with adaptations of standard validity indices for tree clustering.
Findings
Successfully applied to SARS-CoV-2 and betacoronavirus datasets
Identified multiple evolutionary patterns with alternative supertrees
Faster than existing tree clustering techniques
Abstract
Each gene has its own evolutionary history which can substantially differ from the evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer and recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. The output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree. We describe a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of gene phylogenies. We show how an adapted version of the popular k-means clustering algorithm, based on some interesting properties of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Metabolomics and Mass Spectrometry Studies · Gene expression and cancer classification
Methodsk-Means Clustering
