Integrating Big Data and Survey Data for Efficient Estimation of the Median
Ryan Covey (Methodology, Data Science Division, Australian Bureau, of Statistics)

TL;DR
This paper proposes a new estimator for the median that combines big data and survey data, reducing bias and variance for more accurate population estimates.
Contribution
It introduces a novel design-based median estimator integrating big data with survey data, improving accuracy over traditional survey-only methods.
Findings
Estimator is asymptotically unbiased.
Estimator has smaller variance than survey-only median.
Effective integration of big data reduces bias.
Abstract
An ever-increasing deluge of big data is becoming available to national statistical offices globally, but it is well documented that statistics produced by big data alone often suffer from selection bias and are not usually representative of the population at large. In this paper, we construct a new design-based estimator of the median by integrating big data and survey data. Our estimator is asymptotically unbiased and has a smaller variance than a median estimator produced using survey data alone.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Data Mining Algorithms and Applications · Bayesian Methods and Mixture Models
