Identification and validation of respiratory virus immunization using natural language processing
Kevin A. Wilson, John J. Riddles, Andrew C. Hill, Elizabeth A. Bassett, Mengshi Zhou, Michelle Barron, Catia Chavez, Rahul Shrivastava, Anil Battalahalli, Daniel Chacreton, Ethan Moran, Elizabeth Rowley, Zachary A. Weber, Lawrence Reichle, Sarah W. Ball, Amanda B. Payne

TL;DR
The paper introduces an NLP algorithm to detect respiratory virus immunizations in electronic health records, showing high accuracy for some vaccines but lower recall when compared to structured data.
Contribution
A novel rule-based NLP algorithm was developed and validated for identifying respiratory virus immunizations in unstructured EHR text.
Findings
The algorithm achieved high recall (97% for COVID-19) when compared to manual review but low recall (9% for COVID-19) when compared to structured data.
The method demonstrated effectiveness for influenza and RSV immunizations with high precision and moderate recall.
The algorithm can augment structured immunization records by extracting data from narrative EHR text.
Abstract
Electronic health record (EHR)-based research often relies on structured data elements, such as ICD-10-CM and CPT codes, to identify clinical diagnoses and procedures. However, some information, such as the administration of immunizations, may be captured more reliably in the text-based narrative sections of the patient's record. We developed a rule-based natural language processing (NLP) algorithm to identify the administration of immunizations for COVID-19, influenza, and RSV using a combination of synthetic and publicly available data. After applying standard NLP processing techniques to clean and standardize the text, we implemented a multi-stage, rule-based algorithm. We applied a dictionary of general keywords to identify potential immunizations, and a set of specific keywords, which leveraged grammatical dependencies in the text, to increase accuracy. We implemented additional…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance · Vaccine Coverage and Hesitancy · Machine Learning in Healthcare
