A Bayesian algorithm for sample selection bias correction
Valerio Astuti

TL;DR
This paper introduces a Bayesian algorithm that combines large social media data with traditional survey data to correct for sample bias, enhancing the reliability of insights drawn from big data sources.
Contribution
The paper proposes a novel Bayesian method to integrate non-traditional social media data with survey statistics for bias correction.
Findings
Effective bias correction demonstrated in case studies
Enhanced representativeness of social media data
Improved accuracy in population estimates
Abstract
In this paper we present a technique to couple non-traditional data with statistics based on survey data, in order to partially correct for the bias produced by non-random sample selections. All major social media platforms represent huge samples of the general population, generated by a self-selection process. This implies that they are not representative of the larger public, and there are problems in extrapolating conclusions drawn from these samples to the whole population. We present an algorithm to integrate these massive data with ones coming from traditional sources, with the properties of being less extensive but more reliable. This integration allows to exploit the best of both worlds and reach the detail of typical "big data" sources and the representativeness of a carefully designed sample survey.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Data-Driven Disease Surveillance · Complex Network Analysis Techniques
