Doubly Robust Machine Learning for Population Size Estimation with Missing Covariates: Application to Gaza Conflict Mortality
Mateo Dulce Rubio, Edward H. Kennedy, Nicholas P. Jewell

TL;DR
This paper introduces a doubly robust machine learning framework for population size estimation with missing covariates, improving accuracy and validity in challenging data collection scenarios like conflict zones.
Contribution
It develops a novel nonparametric estimation method that combines efficiency, robustness, and flexibility using machine learning under a Missing at Random assumption.
Findings
Simulations show substantial improvements over naive imputation methods.
The method maintains valid inference at high missingness rates.
Application to Gaza conflict data estimates excess mortality by 26%."
Abstract
Population size estimation from capture-recapture data is central for studying hard-to-reach populations, incorporating auxiliary covariates to account for heterogeneous capture probabilities and recapture dependencies. However, missing attributes pose a critical methodological challenge due to reluctance to share sensitive information, data collection limitations, and imperfect record linkage. Existing approaches either ignore missingness or rely on a priori imputation, potentially introducing substantial bias. In this work, we develop a novel nonparametric estimation framework using a Missing at Random assumption to identify capture probabilities under missing covariates. Using semiparametric efficiency theory, we construct one-step estimators that combine efficiency, robustness, and finite-sample validity: they approximately achieve the nonparametric efficiency bound, accommodate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCensus and Population Estimation · Data Quality and Management · Data-Driven Disease Surveillance
