# Correlating Twitter Language with Community-Level Health Outcomes

**Authors:** Arno Schneuwly, Ralf Grubenmann, S\'everine Rion Logean, Mark, Cieliebak, Martin Jaggi

arXiv: 1906.06465 · 2019-06-25

## TL;DR

This paper presents a model that links social media language to community health outcomes, enabling predictions of disease prevalence and uncovering correlations with lifestyle and socioeconomic factors.

## Contribution

It introduces a novel approach using sentence embeddings and clustering to predict health outcomes from social media language without requiring labeled data.

## Key findings

- Successfully predicts community health metrics from Twitter language
- Discovers known and novel correlations with lifestyle and socioeconomic factors
- Applicable to various health outcomes and variables

## Abstract

We study how language on social media is linked to diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially translate these to the individual level. The method is applicable to a wide range of target variables and allows us to discover known and potentially novel correlations of medical outcomes with life-style aspects and other socioeconomic risk factors.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.06465/full.md

## Figures

20 figures with captions in the complete paper: https://tomesphere.com/paper/1906.06465/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/1906.06465/full.md

---
Source: https://tomesphere.com/paper/1906.06465