# On the variance of internode distance under the multispecies coalescent

**Authors:** Sebastien Roch

arXiv: 1812.08357 · 2018-12-21

## TL;DR

This paper investigates the variance in internode distance methods for species tree estimation under incomplete lineage sorting, providing lower bounds on variance and discussing implications for algorithm design.

## Contribution

It derives a linear lower bound on the worst-case variance of internode distances in the multispecies coalescent model, advancing understanding of sample complexity.

## Key findings

- Lower bound on variance depends linearly on species tree graph distance.
- Implications for the design and analysis of internode distance-based methods.
- Enhanced understanding of statistical properties in species tree estimation.

## Abstract

We consider the problem of estimating species trees from unrooted gene tree topologies in the presence of incomplete lineage sorting, a common phenomenon that creates gene tree heterogeneity in multilocus datasets. One popular class of reconstruction methods in this setting is based on internode distances, i.e. the average graph distance between pairs of species across gene trees. While statistical consistency in the limit of large numbers of loci has been established in some cases, little is known about the sample complexity of such methods. Here we make progress on this question by deriving a lower bound on the worst-case variance of internode distance which depends linearly on the corresponding graph distance in the species tree. We also discuss some algorithmic implications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.08357/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1812.08357/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1812.08357/full.md

---
Source: https://tomesphere.com/paper/1812.08357