Thresholding Nonprobability Units in Combined Data for Efficient Domain Estimation
Terrance D. Savitsky, Matthew R. Williams, Julie Gerrshunskaya,, Vladislav Beresovsky

TL;DR
This paper proposes a thresholding method to exclude nonprobability units with low joint probability of appearing in both survey and convenience samples, aiming to improve domain estimation accuracy.
Contribution
It introduces a novel thresholding approach for nonprobability units to reduce bias and variance in combined survey estimates, validated through simulation.
Findings
Excluding units with low joint probability reduces estimation error.
Thresholding improves the bias-variance trade-off in domain estimation.
Simulation results demonstrate the effectiveness of the proposed method.
Abstract
Quasi-randomization approaches estimate latent participation probabilities for units from a nonprobability / convenience sample. Estimation of participation probabilities for convenience units allows their combination with units from the randomized survey sample to form a survey weighted domain estimate. One leverages convenience units for domain estimation under the expectation that estimation precision and bias will improve relative to solely using the survey sample; however, convenience sample units that are very different in their covariate support from the survey sample units may inflate estimation bias or variance. This paper develops a method to threshold or exclude convenience units to minimize the variance of the resulting survey weighted domain estimator. We compare our thresholding method with other thresholding constructions in a simulation study for two classes of datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
