Semiparametric Efficient Data Integration Using the Dual-Frame Sampling Framework
Kosuke Morikawa, Jae Kwang Kim

TL;DR
This paper develops semiparametric methods for integrating probability and non-probability samples, providing efficient estimators that work under unknown sampling mechanisms and are robust to misspecification, with practical implementation in R.
Contribution
It introduces two novel estimators for dual-frame data integration, one achieving the semiparametric efficiency bound and the other offering robustness without explicit modeling.
Findings
The parametric estimator attains the semiparametric efficiency bound.
The second estimator is robust to model misspecification.
Simulations demonstrate efficiency gains and stable performance.
Abstract
Integrating probability and non-probability samples is increasingly important, yet unknown sampling mechanisms in non-probability sources complicate identification and efficient estimation. We develop semiparametric theory for dual-frame data integration and propose two complementary estimators. The first models the non-probability inclusion probability parametrically and attains the semiparametric efficiency bound. We introduce an identifiability condition based on strong monotonicity that identifies sampling-model parameters without instrumental variables, even under informative (non-ignorable) selection, using auxiliary information from the probability sample; it remains valid without record linkage between samples. The second estimator, motivated by a two-stage sampling approximation, avoids explicit modeling of the non-probability mechanism; though not fully efficient, it is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Algorithms · Statistical Methods and Inference
