Social Media Data for Population Mapping: A Bayesian Approach to Address Representativeness and Privacy Challenges
Paolo Andrich, Shengjie Lai, Halim Jun, Qianwen Duan, Zhifeng Cheng, Seth R. Flaxman, Andrew J. Tatem

TL;DR
This paper develops a Bayesian model to improve population estimates from social media data, addressing privacy-induced biases and enabling dynamic, reliable population monitoring for disaster response.
Contribution
It introduces a Bayesian imputation and modeling framework that corrects for privacy biases and links social media data to true populations, enhancing real-time demographic monitoring.
Findings
Bayesian imputation recovers missing data in rural areas.
Model achieves 18-24% error in population proportion estimates.
Accounting for overdispersion and spatial correlation improves accuracy.
Abstract
Accurate and timely population data are essential for disaster response and humanitarian planning, but traditional censuses often cannot capture rapid demographic changes. Social media data offer a promising alternative for dynamic population monitoring, but their representativeness remains poorly understood and stringent privacy requirements limit their reliability. Here, we address these limitations in the context of the Philippines by calibrating Facebook user counts with the country's 2020 census figures. First, we find that differential privacy techniques commonly applied to social media-based population datasets disproportionately mask low-population areas. To address this, we propose a Bayesian imputation approach to recover missing values, restoring data coverage for of rural areas. Further, using the imputed social media data and leveraging predictors such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Mobility and Location-Based Analysis · Data-Driven Disease Surveillance · COVID-19 epidemiological studies
