Efficient Sampling in Disease Surveillance through Subpopulations: Sampling Canaries in the Coal Mine
Ivo V. Stoepker

TL;DR
This paper proposes an optimized stratified sampling approach for disease outbreak detection, demonstrating that sampling from higher-risk subpopulations enhances efficiency, supported by theoretical analysis and a COVID-19 case study.
Contribution
It establishes a theoretical relationship between sampling efficiency and baseline disease risk in subpopulations, guiding targeted surveillance strategies.
Findings
Sampling efficiency is inversely proportional to baseline disease risk ratio.
Sampling from higher-risk subpopulations improves outbreak detection effectiveness.
Case study confirms theoretical predictions with COVID-19 data in the Netherlands.
Abstract
We consider outbreak detection settings of endemic diseases where the population under study consists of various subpopulations available for stratified surveillance. These subpopulations can for example be based on age cohorts, but may also correspond to other subgroups of the population under study such as international travellers. Rather than sampling uniformly across the population, one may elevate the effectiveness of the detection methodology by optimally choosing a sampling subpopulation. We show (under some assumptions) the relative sampling efficiency between two subpopulations is inversely proportional to the ratio of their respective baseline disease risks. This implies one can increase sampling efficiency by sampling from the subpopulation with higher baseline disease risk. Our results require careful treatment of the power curves of exact binomial tests as a function of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
