A Bayesian algorithm for sample selection bias correction

Valerio Astuti

arXiv:2212.09813·stat.ME·December 21, 2022

A Bayesian algorithm for sample selection bias correction

Valerio Astuti

PDF

Open Access

TL;DR

This paper introduces a Bayesian algorithm that combines large social media data with traditional survey data to correct for sample bias, enhancing the reliability of insights drawn from big data sources.

Contribution

The paper proposes a novel Bayesian method to integrate non-traditional social media data with survey statistics for bias correction.

Findings

01

Effective bias correction demonstrated in case studies

02

Enhanced representativeness of social media data

03

Improved accuracy in population estimates

Abstract

In this paper we present a technique to couple non-traditional data with statistics based on survey data, in order to partially correct for the bias produced by non-random sample selections. All major social media platforms represent huge samples of the general population, generated by a self-selection process. This implies that they are not representative of the larger public, and there are problems in extrapolating conclusions drawn from these samples to the whole population. We present an algorithm to integrate these massive data with ones coming from traditional sources, with the properties of being less extensive but more reliable. This integration allows to exploit the best of both worlds and reach the detail of typical "big data" sources and the representativeness of a carefully designed sample survey.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Data-Driven Disease Surveillance · Complex Network Analysis Techniques