Debiased machine learning for combining probability and non-probability survey data
Shaun Seaman

TL;DR
This paper develops a framework for using machine learning in double robust estimators to accurately estimate population means from combined probability and non-probability survey data, even with model misspecification.
Contribution
It introduces a general approach allowing valid use of machine learning for nuisance estimation in double robust estimators with complex survey designs.
Findings
DR estimators are asymptotically normal with machine learning nuisance estimates
Cross-fitting improves estimator validity under model misspecification
Simulation shows superior performance over parametric models
Abstract
We consider the problem of estimating the finite population mean of an outcome variable using data from a nonprobability sample and auxiliary information from a probability sample. Existing double robust (DR) estimators of this mean require the estimation of two nuisance functions: the conditional probability of selection into the nonprobability sample given covariates that are observed in both samples, and the conditional expectation of given . These nuisance functions can be estimated using parametric models, but the resulting estimator of will typically be biased if both parametric models are misspecified. It would therefore be advantageous to be able to use more flexible data-adaptive / machine-learning estimators of the nuisance functions. Here, we develop a general framework for the valid use of DR estimators of when the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Advanced Causal Inference Techniques · Statistical Methods and Inference
