Classification of molecular sequence data using Bayesian phylogenetic mixture models
Elisa Loza-Reyes, Merrilee Hurn, Tony Robinson

TL;DR
This paper introduces Bayesian phylogenetic mixture models with allocation variables for classifying sites in molecular sequences, allowing for flexible, data-driven identification of rate variation without prior knowledge of site classes.
Contribution
It presents a novel Bayesian mixture model approach that does not require prior site classification or fixed number of classes, improving site classification in phylogenetics.
Findings
Mixture models outperform traditional rate variation models.
Site classification is achievable directly from model output.
Method effectively identifies structure in sequence alignments.
Abstract
Rate variation among the sites of a molecular sequence is commonly found in applications of phylogenetic inference. Several approaches exist to account for this feature but they do not usually enable the investigator to pinpoint the sites that evolve under one or another rate of evolution in a straightforward manner. The focus is on Bayesian phylogenetic mixture models, augmented with allocation variables, as tools for site classification and quantification of classification uncertainty. The method does not rely on prior knowledge of site membership to classes or even the number of classes. Furthermore, it does not require correlated sites to be next to one another in the sequence alignment, unlike some phylogenetic hidden Markov or change-point models. In the approach presented, model selection on the number and type of mixture components is conducted ahead of both model estimation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Genomics and Phylogenetic Studies · Genetic diversity and population structure
