Machine learning methods for finite population parameter estimation in survey sampling
Mehdi Dagdoug, David Haziza

TL;DR
This review explores how machine learning enhances finite-population survey inference, addressing challenges in maintaining design-based validity and proposing solutions like cross-fitting and Neyman-orthogonal estimation.
Contribution
It adapts double/debiased machine learning techniques to survey sampling, ensuring valid inference with high-dimensional and nonparametric models.
Findings
Cross-fitting and Neyman-orthogonal estimation enable valid inference with machine learning in survey sampling.
Outcome-agnostic inverse-probability weighting remains operationally attractive for unit nonresponse.
The paper discusses the integration of machine learning in small area estimation and data integration.
Abstract
This pedagogical review examines the use of machine learning methods in finite-population inference for survey sampling, with an emphasis on design-based validity and statistical inference. While flexible prediction tools offer substantial gains in estimation accuracy, they also introduce important challenges, primarily due to the dependence between the fitted predictors and the sample. We focus on settings in which such predictions enter survey estimation through model-assisted estimation, item nonresponse imputation, and unit nonresponse adjustment. For model-assisted estimation and item nonresponse, we show how cross-fitting and Neyman-orthogonal estimating equations can adapt ideas from double/debiased machine learning to survey data, allowing the use of high-dimensional or nonparametric learners while preserving root-n consistency and asymptotic normality under suitable conditions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
