Clustering genes of common evolutionary history

Kevin Gori; Tomasz Suchan; Nadir Alvarez; Nick Goldman; Christophe; Dessimoz

arXiv:1510.02356·q-bio.PE·March 10, 2016

Clustering genes of common evolutionary history

Kevin Gori, Tomasz Suchan, Nadir Alvarez, Nick Goldman, Christophe, Dessimoz

PDF

TL;DR

This paper evaluates clustering methods for genes with shared evolutionary history, introducing statistical tests for optimal cluster number, and demonstrates improved phylogenetic analysis accuracy.

Contribution

It systematically compares clustering methods for phylogenetic loci and introduces new statistical tests for determining the optimal number of clusters.

Findings

01

Branch length-aware distances perform best

02

Spectral clustering and Ward's method are most effective

03

New statistical tests outperform silhouette criterion

Abstract

Phylogenetic inference can potentially result in a more accurate tree using data from multiple loci. However, if the loci are incongruent--due to events such as incomplete lineage sorting or horizontal gene transfer--it can be misleading to infer a single tree. To address this, many previous contributions have taken a mechanistic approach, by modelling specific processes. Alternatively, one can cluster loci without assuming how these incongruencies might arise. Such "process-agnostic" approaches typically infer a tree for each locus and cluster these. There are, however, many possible combinations of tree distance and clustering methods; their comparative performance in the context of tree incongruence is largely unknown. Furthermore, because standard model selection criteria such as AIC cannot be applied to problems with a variable number of topologies, the issue of inferring the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.