Large-scale Species Tree Estimation
Erin Molloy, Tandy Warnow

TL;DR
This paper reviews recent methods for large-scale species tree estimation, emphasizing approaches that handle gene tree heterogeneity caused by incomplete lineage sorting and strategies for analyzing extensive multi-locus datasets.
Contribution
It provides a comprehensive review of new methods and divide-and-conquer strategies, including algorithms like TreeMerge for large-scale species tree estimation.
Findings
Overview of recent methods addressing gene tree heterogeneity
Discussion of divide-and-conquer strategies for large datasets
Introduction of algorithms like TreeMerge for efficient analysis
Abstract
Species tree estimation is a complex problem, due to the fact that different parts of the genome can have different evolutionary histories than the genome itself. One of the causes for this discord is incomplete lineage sorting (also called deep coalescence), which is a population-level process that produces gene trees that differ from the species tree. The last decade has seen a large number of new methods developed to estimate species trees from multi-locus datasets, specifically addressing this cause of gene tree heterogeneity. In this paper, we review these methods, focusing mainly on issues that relate to analyses of datasets containing large numbers of species or loci (or both). We also discuss divide-and-conquer strategies for enabling species tree estimation methods to run on large datasets, including new approaches that are based on algorithms (such as TreeMerge) for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Identification and Quantification in Food
