Debiased machine learning for combining probability and non-probability survey data

Shaun Seaman

arXiv:2508.08948·stat.ME·October 30, 2025

Debiased machine learning for combining probability and non-probability survey data

Shaun Seaman

PDF

Open Access

TL;DR

This paper develops a framework for using machine learning in double robust estimators to accurately estimate population means from combined probability and non-probability survey data, even with model misspecification.

Contribution

It introduces a general approach allowing valid use of machine learning for nuisance estimation in double robust estimators with complex survey designs.

Findings

01

DR estimators are asymptotically normal with machine learning nuisance estimates

02

Cross-fitting improves estimator validity under model misspecification

03

Simulation shows superior performance over parametric models

Abstract

We consider the problem of estimating the finite population mean $\overset{ˉ}{Y}$ of an outcome variable $Y$ using data from a nonprobability sample and auxiliary information from a probability sample. Existing double robust (DR) estimators of this mean $\overset{ˉ}{Y}$ require the estimation of two nuisance functions: the conditional probability of selection into the nonprobability sample given covariates $X$ that are observed in both samples, and the conditional expectation of $Y$ given $X$ . These nuisance functions can be estimated using parametric models, but the resulting estimator of $\overset{ˉ}{Y}$ will typically be biased if both parametric models are misspecified. It would therefore be advantageous to be able to use more flexible data-adaptive / machine-learning estimators of the nuisance functions. Here, we develop a general framework for the valid use of DR estimators of $\overset{ˉ}{Y}$ when the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Bayesian Inference · Advanced Causal Inference Techniques · Statistical Methods and Inference