Model Assisted Data Integration: An unbiased sampling strategy to use nonprobability data
Martin Hyllienmark, Gustaf Strandell

TL;DR
This paper introduces the Model Assisted Data Integration (MADI) sampling strategy, combining nonprobability and probability data to produce unbiased estimates with lower variance, supported by empirical data.
Contribution
It extends the Data Integrated estimator by proposing a new sampling strategy that ensures unbiasedness and variance reduction using machine learning models.
Findings
MADI achieves lower variance than traditional estimators.
The estimator remains design-unbiased with a suitable variance estimator.
Empirical data demonstrates improved estimation accuracy.
Abstract
The aim of survey statistics is to produce estimates with a minimal bias and a corresponding acceptable variance given a specific budget, preferable with a minor response burden for the participants. In recent years, considerable efforts have been made to achieve this through the extended use of found or non-probability data. However, to be able to safely utilize such data, rigorous theoretical foundations is needed, where one main concern is the of lack control due to not having access to the selection mechanism for the data. Several methods have been proposed in the literature to deal with this, though often relying on assumptions that may be difficult or impossible to verify in practice. Extending on the Data Integrated (DI) estimator introduced by Kim and Tam (2021), this paper introduce the Model Assisted Data Integration (MADI) sampling strategy. The proposed sampling strategy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
