Automatically Detecting Self-Reported Birth Defect Outcomes on Twitter   for Large-scale Epidemiological Research

Ari Z. Klein; Abeed Sarker; Davy Weissenbacher; Graciela; Gonzalez-Hernandez

arXiv:1810.09506·cs.CL·October 3, 2019·6 cites

Automatically Detecting Self-Reported Birth Defect Outcomes on Twitter for Large-scale Epidemiological Research

Ari Z. Klein, Abeed Sarker, Davy Weissenbacher, Graciela, Gonzalez-Hernandez

PDF

Open Access

TL;DR

This study develops and evaluates machine learning methods to automatically identify tweets reporting birth defect outcomes, aiming to enable large-scale epidemiological research using social media data.

Contribution

It introduces NLP and supervised learning techniques for detecting relevant tweets, compares various classifiers and sampling strategies, and provides a publicly available dataset for future research.

Findings

01

Support Vector Machine achieved F1-score of 0.65 for 'defect' class.

02

Deep learning classifiers were evaluated but did not outperform SVM.

03

Error analysis offers insights for improving classification accuracy.

Abstract

In recent work, we identified and studied a small cohort of Twitter users whose pregnancies with birth defect outcomes could be observed via their publicly available tweets. Exploiting social media's large-scale potential to complement the limited methods for studying birth defects, the leading cause of infant mortality, depends on the further development of automatic methods. The primary objective of this study was to take the first step towards scaling the use of social media for observing pregnancies with birth defect outcomes, namely, developing methods for automatically detecting tweets by users reporting their birth defect outcomes. We annotated and pre-processed approximately 23,000 tweets that mention birth defects in order to train and evaluate supervised machine learning algorithms, including feature-engineered and deep learning-based classifiers. We also experimented with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Hate Speech and Cyberbullying Detection