Online Bayesian phylogenetic inference: theoretical foundations via Sequential Monte Carlo
Vu Dinh, Aaron E. Darling, Frederick A. Matsen IV

TL;DR
This paper develops a theoretical foundation for online Bayesian phylogenetic inference using Sequential Monte Carlo, enabling efficient updates of evolutionary trees as new genetic data becomes available.
Contribution
It provides the first theoretical analysis of online Bayesian phylogenetics, including consistency results and bounds on likelihood surface changes with new data.
Findings
SMC methods are consistent in sampling from the correct distribution.
Effective sample size (ESS) grows linearly with the number of particles.
Bounds on likelihood surface changes facilitate performance analysis.
Abstract
Phylogenetics, the inference of evolutionary trees from molecular sequence data such as DNA, is an enterprise that yields valuable evolutionary understanding of many biological systems. Bayesian phylogenetic algorithms, which approximate a posterior distribution on trees, have become a popular if computationally expensive means of doing phylogenetics. Modern data collection technologies are quickly adding new sequences to already substantial databases. With all current techniques for Bayesian phylogenetics, computation must start anew each time a sequence becomes available, making it costly to maintain an up-to-date estimate of a phylogenetic posterior. These considerations highlight the need for an \emph{online} Bayesian phylogenetic method which can update an existing posterior with new sequences. Here we provide theoretical results on the consistency and stability of methods for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
