Tracing State-Level Obesity Prevalence from Sentence Embeddings of Tweets: A Feasibility Study
Xiaoyi Zhang, Rodoniki Athanasiadou, Narges Razavian

TL;DR
This study demonstrates a deep learning method that uses hashtag-supervised tweet embeddings to estimate state-level obesity prevalence, outperforming keyword-based methods and revealing potential risk factors from Twitter data.
Contribution
Introduces a hashtag-supervised deep learning approach for extracting informative tweet embeddings to estimate public health metrics from Twitter data.
Findings
Strong correlation between textual features and government obesity data
Outperforms keyword-matching baseline in estimation accuracy
Potential to discover new risk factors from textual features
Abstract
Twitter data has been shown broadly applicable for public health surveillance. Previous public health studies based on Twitter data have largely relied on keyword-matching or topic models for clustering relevant tweets. However, both methods suffer from the short-length of texts and unpredictable noise that naturally occurs in user-generated contexts. In response, we introduce a deep learning approach that uses hashtags as a form of supervision and learns tweet embeddings for extracting informative textual features. In this case study, we address the specific task of estimating state-level obesity from dietary-related textual features. Our approach yields an estimation that strongly correlates the textual features to government data and outperforms the keyword-matching baseline. The results also demonstrate the potential of discovering risk factors using the textual features. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance · Smoking Behavior and Cessation · Computational and Text Analysis Methods
