Consistency and convergence rate of phylogenetic inference via regularization
Vu Dinh, Lam Si Tung Ho, Marc A. Suchard, Frederick A. Matsen IV

TL;DR
This paper introduces a regularized maximum likelihood method for gene tree reconstruction in phylogenetics, proving its consistency and convergence rates, and demonstrating its ability to accurately recover trees with polynomial data.
Contribution
It develops a novel penalized likelihood approach using geodesic distance to the species tree and establishes its theoretical properties including consistency and convergence rates.
Findings
Method is consistent for gene tree reconstruction.
Estimator converges rapidly for edges longer than a threshold.
Works with approximate species trees, not just exact ones.
Abstract
It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct "gene tree." Although the gene tree may deviate from the "species tree" due to a variety of genetic processes, in the absence of evidence to the contrary it is parsimonious to assume that they agree. A common statistical approach in these situations is to develop a likelihood penalty to incorporate such additional information. Recent studies using simulation and empirical data suggest that a likelihood penalty quantifying concordance with a species tree can significantly improve the accuracy of gene tree reconstruction compared to using sequence data alone. However, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
