Streamlining Social Media Information Retrieval for COVID-19 Research   with Deep Learning

Yining Hua; Jiageng Wu; Shixu Lin; Minghui Li; Yujie Zhang; Dinah; Foer; Siwen Wang; Peilin Zhou; Jie Yang; Li Zhou

arXiv:2306.16001·cs.CL·March 19, 2024

Streamlining Social Media Information Retrieval for COVID-19 Research with Deep Learning

Yining Hua, Jiageng Wu, Shixu Lin, Minghui Li, Yujie Zhang, Dinah, Foer, Siwen Wang, Peilin Zhou, Jie Yang, Li Zhou

PDF

Open Access 2 Repos

TL;DR

This paper presents a novel pipeline that leverages deep learning to automatically curate a comprehensive symptom dictionary from COVID-19 related social media data, enhancing epidemic surveillance and public health research.

Contribution

It introduces a systematic, multi-module approach for extracting and mapping colloquial medical terms from social media to standardized medical concepts, improving accuracy and coverage.

Findings

01

Identified nearly 500,000 unique symptom expressions from tweets.

02

Achieved 95% accuracy in mapping symptoms to UMLS concepts.

03

Detected more symptoms, including psychiatric disorders, than traditional lexicons.

Abstract

Objective: Social media-based public health research is crucial for epidemic surveillance, but most studies identify relevant corpora with keyword-matching. This study develops a system to streamline the process of curating colloquial medical dictionaries. We demonstrate the pipeline by curating a UMLS-colloquial symptom dictionary from COVID-19-related tweets as proof of concept. Methods: COVID-19-related tweets from February 1, 2020, to April 30, 2022 were used. The pipeline includes three modules: a named entity recognition module to detect symptoms in tweets; an entity normalization module to aggregate detected entities; and a mapping module that iteratively maps entities to Unified Medical Language System concepts. A random 500 entity sample were drawn from the final dictionary for accuracy validation. Additionally, we conducted a symptom frequency distribution analysis to compare…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Misinformation and Its Impacts · Sentiment Analysis and Opinion Mining