An Empirical Comparison of Methods to Produce Business Statistics Using Non-Probability Data
Lyndon Ang, Robert Clark, Bronwyn Loong, Anders Holmberg

TL;DR
This study compares various methods for producing business population estimates using non-probability data, highlighting the effectiveness of different approaches under various data quality and missingness scenarios.
Contribution
It provides a comprehensive simulation-based comparison of methods to correct biases in non-probability business data for official statistics.
Findings
Screening dual-frame approach reduces sample size and MSE when no measurement error.
Measurement error and missingness increase estimator errors.
Model-assisted estimators based on probability samples perform best under data imperfections.
Abstract
There is a growing trend among statistical agencies to explore non-probability data sources for producing more timely and detailed statistics, while reducing costs and respondent burden. Coverage and measurement error are two issues that may be present in such data. The imperfections may be corrected using available information relating to the population of interest, such as a census or a reference probability sample. In this paper, we compare a wide range of existing methods for producing population estimates using a non-probability dataset through a simulation study based on a realistic business population. The study was conducted to examine the performance of the methods under different missingness and data quality assumptions. The results confirm the ability of the methods examined to address selection bias. When no measurement error is present in the non-probability dataset, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications
