Covariance Decomposition for Distance Based Species Tree Estimation
Georgios Aliatimis, Ruriko Yoshida, Burak Boyaci, James A. Grant

TL;DR
This paper derives the covariance matrix of pairwise species-distance estimates under a combined coalescent and substitution model, enabling improved confidence estimation in species tree inference.
Contribution
It provides an exact covariance decomposition for distance estimates, revealing dominant noise sources under different mutation regimes, and introduces a Gaussian-sampling method for reliable confidence assessment.
Findings
Substitutional noise dominates at very low and high mutation rates.
Coalescent variance is primary at intermediate mutation rates.
The Gaussian-sampling approach improves confidence estimate reliability.
Abstract
In phylogenomics, species-tree methods must contend with two major sources of noise; stochastic gene-tree variation under the multispecies coalescent model (MSC) and finite-sequence substitutional noise. Fast agglomerative methods such as GLASS, STEAC, and METAL combine multi-locus information via distance-based clustering. We derive the exact covariance matrix of these pairwise distance estimates under a joint MSC-plus-substitution model and leverage it for reliable confidence estimation, and we algebraically decompose it into components attributable to coalescent variation versus sequence-level stochasticity. Our theory identifies parameter regimes where one source of variance greatly exceeds the other. For both very low and very high mutation rates, substitutional noise dominates, while coalescent variance is the primary contributor at intermediate mutation rates. Moreover, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Evolution and Paleontology Studies
