Posterior bounds on divergence time of two sequences under dependent-site evolutionary models
Joseph Mathews, Scott C. Schmidler

TL;DR
This paper derives bounds on the posterior distribution of divergence time between two DNA sequences under dependent-site evolutionary models, improving understanding of estimation accuracy in phylogenetics.
Contribution
It establishes a logarithmic factor concentration bound for the posterior of divergence time under dependent-site models, extending previous results to more complex evolutionary scenarios.
Findings
Posterior distribution concentrates within a logarithmic factor of p-distance.
T exceeds p with vanishing posterior probability as sequence length increases.
Results apply to models with site dependence and constant mutation rates.
Abstract
Let x and y be two length n DNA sequences, and suppose we would like to estimate the divergence time T. A well known simple but crude estimate of T is p := d(x,y)/n, the fraction of mutated sites (the p-distance). We establish a posterior concentration bound on T, showing that the posterior distribution of T concentrates within a logarithmic factor of p when d(x,y)log(n)/n = o(1). Our bounds hold under a large class of evolutionary models, including many standard models that incorporate site dependence. As a special case, we show that T exceeds p with vanishingly small posterior probability as n increases under models with constant mutation rates, complementing the result of Mihaescu and Steel (Appl Math Lett 23(9):975--979, 2010). Our approach is based on bounding sequence transition probabilities in various convergence regimes of the underlying evolutionary process. Our result may be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
