Model Assisted Data Integration: An unbiased sampling strategy to use nonprobability data

Martin Hyllienmark; Gustaf Strandell

arXiv:2604.00956·stat.ME·April 2, 2026

Model Assisted Data Integration: An unbiased sampling strategy to use nonprobability data

Martin Hyllienmark, Gustaf Strandell

PDF

TL;DR

This paper introduces the Model Assisted Data Integration (MADI) sampling strategy, combining nonprobability and probability data to produce unbiased estimates with lower variance, supported by empirical data.

Contribution

It extends the Data Integrated estimator by proposing a new sampling strategy that ensures unbiasedness and variance reduction using machine learning models.

Findings

01

MADI achieves lower variance than traditional estimators.

02

The estimator remains design-unbiased with a suitable variance estimator.

03

Empirical data demonstrates improved estimation accuracy.

Abstract

The aim of survey statistics is to produce estimates with a minimal bias and a corresponding acceptable variance given a specific budget, preferable with a minor response burden for the participants. In recent years, considerable efforts have been made to achieve this through the extended use of found or non-probability data. However, to be able to safely utilize such data, rigorous theoretical foundations is needed, where one main concern is the of lack control due to not having access to the selection mechanism for the data. Several methods have been proposed in the literature to deal with this, though often relying on assumptions that may be difficult or impossible to verify in practice. Extending on the Data Integrated (DI) estimator introduced by Kim and Tam (2021), this paper introduce the Model Assisted Data Integration (MADI) sampling strategy. The proposed sampling strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.