The mean and variance of phylogenetic diversity under rarefaction
David A. Nipperess, Frederick A. Matsen IV

TL;DR
This paper derives exact analytical formulas for the mean and variance of phylogenetic diversity under rarefaction, enabling more efficient and accurate comparisons of diversity across samples with different depths.
Contribution
The authors present the first exact formulas for the mean and variance of phylogenetic diversity under rarefaction, validated against Monte Carlo simulations.
Findings
Analytical formulas match Monte Carlo estimates closely.
Rarefaction alters the perceived hotspots of diversity.
The method is more efficient than repeated subsampling.
Abstract
Phylogenetic diversity (PD) depends on sampling intensity, which complicates the comparison of PD between samples of different depth. One approach to dealing with differing sample depth for a given diversity statistic is to rarefy, which means to take a random subset of a given size of the original sample. Exact analytical formulae for the mean and variance of species richness under rarefaction have existed for some time but no such solution exists for PD. We have derived exact formulae for the mean and variance of PD under rarefaction. We show that these formulae are correct by comparing exact solution mean and variance to that calculated by repeated random (Monte Carlo) subsampling of a dataset of stem counts of woody shrubs of Toohey Forest, Queensland, Australia. We also demonstrate the application of the method using two examples: identifying hotspots of mammalian diversity in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWildlife Ecology and Conservation · Gut microbiota and health · Reproductive tract infections research
