Quantifying and mitigating the effect of preferential sampling on phylodynamic inference
Michael D. Karcher, Julia A. Palacios, Trevor Bedford, Marc A., Suchard, Vladimir N. Minin

TL;DR
This paper identifies biases in phylodynamic inference caused by preferential sampling and introduces a new model that explicitly accounts for sampling dependence on population size, improving accuracy and precision.
Contribution
The paper proposes a novel model that explicitly incorporates preferential sampling in phylodynamics, reducing bias and enhancing estimation accuracy.
Findings
Bias occurs when sampling depends on population size.
The new model reduces bias and improves estimation precision.
Application to influenza data demonstrates practical benefits.
Abstract
Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals' genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
