Building alternative consensus trees and supertrees using k-means and   Robinson and Foulds distance

Nadia Tahiri; Bernard Fichet; Vladimir Makarenkov

arXiv:2103.13343·q-bio.PE·May 26, 2022·Bioinform.

Building alternative consensus trees and supertrees using k-means and Robinson and Foulds distance

Nadia Tahiri, Bernard Fichet, Vladimir Makarenkov

PDF

Open Access

TL;DR

This paper introduces a novel, efficient k-means based method utilizing Robinson and Foulds distance to infer multiple alternative consensus trees and supertrees, effectively capturing diverse evolutionary patterns in large gene datasets.

Contribution

It presents a new clustering approach for phylogenetic trees that improves speed and handles heterogeneous data, with adaptations of standard validity indices for tree clustering.

Findings

01

Successfully applied to SARS-CoV-2 and betacoronavirus datasets

02

Identified multiple evolutionary patterns with alternative supertrees

03

Faster than existing tree clustering techniques

Abstract

Each gene has its own evolutionary history which can substantially differ from the evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer and recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. The output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree. We describe a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of gene phylogenies. We show how an adapted version of the popular k-means clustering algorithm, based on some interesting properties of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBioinformatics and Genomic Networks · Metabolomics and Mass Spectrometry Studies · Gene expression and cancer classification

Methodsk-Means Clustering