Streamlining Social Media Information Retrieval for COVID-19 Research with Deep Learning
Yining Hua, Jiageng Wu, Shixu Lin, Minghui Li, Yujie Zhang, Dinah, Foer, Siwen Wang, Peilin Zhou, Jie Yang, Li Zhou

TL;DR
This paper presents a novel pipeline that leverages deep learning to automatically curate a comprehensive symptom dictionary from COVID-19 related social media data, enhancing epidemic surveillance and public health research.
Contribution
It introduces a systematic, multi-module approach for extracting and mapping colloquial medical terms from social media to standardized medical concepts, improving accuracy and coverage.
Findings
Identified nearly 500,000 unique symptom expressions from tweets.
Achieved 95% accuracy in mapping symptoms to UMLS concepts.
Detected more symptoms, including psychiatric disorders, than traditional lexicons.
Abstract
Objective: Social media-based public health research is crucial for epidemic surveillance, but most studies identify relevant corpora with keyword-matching. This study develops a system to streamline the process of curating colloquial medical dictionaries. We demonstrate the pipeline by curating a UMLS-colloquial symptom dictionary from COVID-19-related tweets as proof of concept. Methods: COVID-19-related tweets from February 1, 2020, to April 30, 2022 were used. The pipeline includes three modules: a named entity recognition module to detect symptoms in tweets; an entity normalization module to aggregate detected entities; and a mapping module that iteratively maps entities to Unified Medical Language System concepts. A random 500 entity sample were drawn from the final dictionary for accuracy validation. Additionally, we conducted a symptom frequency distribution analysis to compare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Misinformation and Its Impacts · Sentiment Analysis and Opinion Mining
