Fast Convergence of MCMC Algorithms for Phylogenetic Reconstruction with Homogeneous Data on Closely Related Species
Daniel Stefankovic, Eric Vigoda

TL;DR
This paper proves that a Markov chain using SPR moves converges rapidly to the true phylogenetic tree in a simplified setting with homogeneous data and short edge lengths, contrasting with slow convergence in heterogeneous data.
Contribution
It provides a combinatorial proof that SPR-based Markov chains rapidly converge in a simplified, homogeneous data setting, highlighting conditions for fast phylogenetic reconstruction.
Findings
Markov chain with SPR moves is rapidly mixing under certain conditions.
Maximum parsimony score aligns with the leading term of the likelihood function.
Heterogeneous data can cause exponential slowdowns in convergence.
Abstract
This paper studies a Markov chain for phylogenetic reconstruction which uses a popular transition between tree topologies known as subtree pruning-and-regrafting (SPR). We analyze the Markov chain in the simpler setting that the generating tree consists of very short edge lengths, short enough so that each sample from the generating tree (or character in phylogenetic terminology) is likely to have only one mutation, and that there enough samples so that the data looks like the generating distribution. We prove in this setting that the Markov chain is rapidly mixing, i.e., it quickly converges to its stationary distribution, which is the posterior distribution over tree topologies. Our proofs use that the leading term of the maximum likelihood function of a tree T is the maximum parsimony score, which is the size of the minimum cut in T needed to realize single edge cuts of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Genomics and Phylogenetic Studies · Genome Rearrangement Algorithms
