Extracting Post-Acute Sequelae of SARS-CoV-2 Infection Symptoms from Clinical Notes via Hybrid Natural Language Processing
Zilong Bai, Zihan Xu, Cong Sun, Chengxi Zang, H. Timothy Bunnell, Catherine Sinfield, Jacqueline Rutter, Aaron Thomas Martinez, L. Charles Bailey, Mark Weiner, Thomas R. Campion, Thomas Carton, Christopher B. Forrest, Rainu Kaushal, Fei Wang, Yifan Peng

TL;DR
This paper presents a hybrid NLP pipeline combining rule-based and BERT models to extract and detect PASC symptoms from clinical notes, improving diagnosis accuracy and efficiency across multiple health systems.
Contribution
We developed a novel hybrid NLP approach with a comprehensive PASC lexicon, validated across multiple sites, demonstrating high accuracy and processing speed for symptom extraction from clinical notes.
Findings
Achieved an average F1 score of 0.82 internally and 0.76 externally.
Processed notes in approximately 2.45 seconds each.
Showed strong correlation ($\rho > 0.83$) between model mentions and clinical diagnoses.
Abstract
Accurately and efficiently diagnosing Post-Acute Sequelae of COVID-19 (PASC) remains challenging due to its myriad symptoms that evolve over long- and variable-time intervals. To address this issue, we developed a hybrid natural language processing pipeline that integrates rule-based named entity recognition with BERT-based assertion detection modules for PASC-symptom extraction and assertion detection from clinical notes. We developed a comprehensive PASC lexicon with clinical specialists. From 11 health systems of the RECOVER initiative network across the U.S., we curated 160 intake progress notes for model development and evaluation, and collected 47,654 progress notes for a population-level prevalence study. We achieved an average F1 score of 0.82 in one-site internal validation and 0.76 in 10-site external validation for assertion detection. Our pipeline processed each note at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Mental Health via Writing
