Statistical learning with phylogenetic network invariants
Travis Barton, Elizabeth Gross, Colby Long, Joseph Rusinko

TL;DR
This paper introduces a novel method combining phylogenetic network invariants and support vector machines to accurately infer 4-leaf phylogenetic networks, addressing the challenge of residual deviations in real data.
Contribution
It proposes a new approach that uses invariant residuals and machine learning to classify phylogenetic networks from sequence data.
Findings
Effective classification of 4-leaf networks demonstrated on simulated data
Method successfully applied to primate genetic data
Improves inference accuracy over traditional invariant-based methods
Abstract
Phylogenetic networks provide a means of describing the evolutionary history of sets of species believed to have undergone hybridization or gene flow during their evolution. The mutation process for a set of such species can be modeled as a Markov process on a phylogenetic network. Previous work has shown that a site-pattern probability distributions from a Jukes-Cantor phylogenetic network model must satisfy certain algebraic invariants. As a corollary, aspects of the phylogenetic network are theoretically identifiable from site-pattern frequencies. In practice, because of the probabilistic nature of sequence evolution, the phylogenetic network invariants will rarely be satisfied, even for data generated under the model. Thus, using network invariants for inferring phylogenetic networks requires some means of interpreting the residuals, or deviations from zero, when observed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolution and Paleontology Studies · Genomics and Phylogenetic Studies · Bayesian Methods and Mixture Models
