The Healthy States of America: Creating a Health Taxonomy with Social   Media

Sanja Scepanovic; Luca Maria Aiello; Ke Zhou; Sagar Joglekar; Daniele; Quercia

arXiv:2103.01169·cs.CY·March 2, 2021

The Healthy States of America: Creating a Health Taxonomy with Social Media

Sanja Scepanovic, Luca Maria Aiello, Ke Zhou, Sagar Joglekar, Daniele, Quercia

PDF

TL;DR

This paper introduces a novel deep learning NLP tool that automatically extracts and categorizes medical conditions from social media, creating a comprehensive taxonomy validated against ICD-11 and linked to disease prevalence data.

Contribution

The authors developed the first automated taxonomy of medical conditions from social media discussions, validated it against ICD-11, and linked social media mentions to official disease prevalence.

Findings

01

Created a taxonomy matching 20 of 22 ICD-11 categories

02

Validated disease mention clusters against official classifications

03

Linked social media health scores with actual disease prevalence

Abstract

Since the uptake of social media, researchers have mined online discussions to track the outbreak and evolution of specific diseases or chronic conditions such as influenza or depression. To broaden the set of diseases under study, we developed a Deep Learning tool for Natural Language Processing that extracts mentions of virtually any medical condition or disease from unstructured social media text. With that tool at hand, we processed Reddit and Twitter posts, analyzed the clusters of the two resulting co-occurrence networks of conditions, and discovered that they correspond to well-defined categories of medical conditions. This resulted in the creation of the first comprehensive taxonomy of medical conditions automatically derived from online discussions. We validated the structure of our taxonomy against the official International Statistical Classification of Diseases and Related…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.