Predicting Zip Code-Level Vaccine Hesitancy in US Metropolitan Areas Using Machine Learning Models on Public Tweets
Sara Melotte, Mayank Kejriwal

TL;DR
This study evaluates machine learning models on public Twitter data to predict zip code-level COVID-19 vaccine hesitancy in US metropolitan areas, demonstrating that models outperform simple baseline priors.
Contribution
It presents a methodology for using social media data and socioeconomic features to predict vaccine hesitancy at a granular geographic level, with comparative evaluation of models.
Findings
Best models outperform constant priors
Open-source tools can be used for setup
Feasibility of real-time social media-based prediction
Abstract
Although the recent rise and uptake of COVID-19 vaccines in the United States has been encouraging, there continues to be significant vaccine hesitancy in various geographic and demographic clusters of the adult population. Surveys, such as the one conducted by Gallup over the past year, can be useful in determining vaccine hesitancy, but can be expensive to conduct and do not provide real-time data. At the same time, the advent of social media suggests that it may be possible to get vaccine hesitancy signals at an aggregate level (such as at the level of zip codes) by using machine learning models and socioeconomic (and other) features from publicly available sources. It is an open question at present whether such an endeavor is feasible, and how it compares to baselines that only use constant priors. To our knowledge, a proper methodology and evaluation results using real data has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVaccine Coverage and Hesitancy · Hate Speech and Cyberbullying Detection · Misinformation and Its Impacts
