Automatically Detecting Self-Reported Birth Defect Outcomes on Twitter for Large-scale Epidemiological Research
Ari Z. Klein, Abeed Sarker, Davy Weissenbacher, Graciela, Gonzalez-Hernandez

TL;DR
This study develops and evaluates machine learning methods to automatically identify tweets reporting birth defect outcomes, aiming to enable large-scale epidemiological research using social media data.
Contribution
It introduces NLP and supervised learning techniques for detecting relevant tweets, compares various classifiers and sampling strategies, and provides a publicly available dataset for future research.
Findings
Support Vector Machine achieved F1-score of 0.65 for 'defect' class.
Deep learning classifiers were evaluated but did not outperform SVM.
Error analysis offers insights for improving classification accuracy.
Abstract
In recent work, we identified and studied a small cohort of Twitter users whose pregnancies with birth defect outcomes could be observed via their publicly available tweets. Exploiting social media's large-scale potential to complement the limited methods for studying birth defects, the leading cause of infant mortality, depends on the further development of automatic methods. The primary objective of this study was to take the first step towards scaling the use of social media for observing pregnancies with birth defect outcomes, namely, developing methods for automatically detecting tweets by users reporting their birth defect outcomes. We annotated and pre-processed approximately 23,000 tweets that mention birth defects in order to train and evaluate supervised machine learning algorithms, including feature-engineered and deep learning-based classifiers. We also experimented with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Hate Speech and Cyberbullying Detection
