Analyzing the Language of Food on Social Media
Daniel Fried, Mihai Surdeanu, Stephen Kobourov, Melanie Hingle, Dane, Bell

TL;DR
This study demonstrates that the language used in food-related social media posts can predict various demographic and health-related characteristics, revealing insights into community traits and geographic patterns through advanced NLP techniques.
Contribution
The paper introduces a large-scale analysis linking food-related social media language to demographic and health indicators, employing NLP methods and real-time visualization tools.
Findings
Language predicts overweight and diabetes rates
Language correlates with political leaning and location
Complex NLP improves predictive accuracy
Abstract
We investigate the predictive power behind the language of food on social media. We collect a corpus of over three million food-related posts from Twitter and demonstrate that many latent population characteristics can be directly predicted from this data: overweight rate, diabetes rate, political leaning, and home geographical location of authors. For all tasks, our language-based models significantly outperform the majority-class baselines. Performance is further improved with more complex natural language processing, such as topic modeling. We analyze which textual features have most predictive power for these datasets, providing insight into the connections between the language of food, geographic locale, and community characteristics. Lastly, we design and implement an online system for real-time query and visualization of the dataset. Visualization tools, such as geo-referenced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
