Unrepresentative Big Surveys Significantly Overestimate US Vaccine Uptake
Valerie C. Bradley, Shiro Kuriwaki, Michael Isakov, Dino Sejdinovic,, Xiao-Li Meng, Seth Flaxman

TL;DR
Large surveys with unrepresentative samples significantly overestimate COVID-19 vaccine uptake in the US, demonstrating that data quality is more crucial than sheer data size for accurate public health estimates.
Contribution
This paper reveals the Big Data Paradox in vaccine surveys, showing that bigger data can lead to more biased estimates without proper representativeness.
Findings
Big surveys overestimated vaccine uptake by 14-17 percentage points.
Large sample sizes with bias have minuscule margins of error.
A small, well-conducted survey provided more accurate estimates.
Abstract
Surveys are a crucial tool for understanding public opinion and behavior, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the impact of survey bias, an instance of the Big Data Paradox (Meng 2018). Here we demonstrate this paradox in estimates of first-dose COVID-19 vaccine uptake in US adults: Delphi-Facebook (about 250,000 responses per week) and Census Household Pulse (about 75,000 per week). By May 2021, Delphi-Facebook overestimated uptake by 17 percentage points and Census Household Pulse by 14, compared to a benchmark from the Centers for Disease Control and Prevention (CDC). Moreover, their large data sizes led to minuscule margins of error on the incorrect estimates. In contrast, an Axios-Ipsos online panel with about 1,000…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance · COVID-19 epidemiological studies · Survey Methodology and Nonresponse
