Species tree estimation under joint modeling of coalescence and duplication: sample complexity of quartet methods
Max Hill, Brandon Legried, Sebastien Roch

TL;DR
This paper analyzes the sample complexity of quartet-based methods for species tree estimation under a model that includes both incomplete lineage sorting and gene duplication/loss, providing bounds that depend on duplication and loss rates.
Contribution
It introduces a probabilistic analysis of species tree estimation considering both coalescence and duplication, deriving sample complexity bounds for quartet methods.
Findings
Sample complexity bounds depend on duplication and loss rates.
Quartet methods' performance varies in subcritical and supercritical regimes.
The analysis highlights the impact of gene duplication and loss on inference accuracy.
Abstract
We consider species tree estimation under a standard stochastic model of gene tree evolution that incorporates incomplete lineage sorting (as modeled by a coalescent process) and gene duplication and loss (as modeled by a branching process). Through a probabilistic analysis of the model, we derive sample complexity bounds for widely used quartet-based inference methods that highlight the effect of the duplication and loss rates in both subcritical and supercritical regimes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
