Extracting COVID-19 Diagnoses and Symptoms From Clinical Text: A New Annotated Corpus and Neural Event Extraction Framework
Kevin Lybarger, Mari Ostendorf, Matthew Thompson, Meliha Yetisgen

TL;DR
This paper introduces a new annotated clinical text corpus and a neural event extraction framework to identify COVID-19 diagnoses and symptoms from clinical notes, aiding large-scale epidemiological and clinical research.
Contribution
It provides a novel COVID-19 clinical corpus and a span-based event extraction model with high accuracy for identifying COVID-19 related events and assertions.
Findings
High performance in event and assertion extraction (0.83-0.97 F1).
Automatically extracted symptoms improve COVID-19 test result prediction.
The corpus enables better understanding of COVID-19 clinical presentation.
Abstract
Coronavirus disease 2019 (COVID-19) is a global pandemic. Although much has been learned about the novel coronavirus since its emergence, there are many open questions related to tracking its spread, describing symptomology, predicting the severity of infection, and forecasting healthcare utilization. Free-text clinical notes contain critical information for resolving these questions. Data-driven, automatic information extraction models are needed to use this text-encoded information in large-scale studies. This work presents a new clinical corpus, referred to as the COVID-19 Annotated Clinical Text (CACT) Corpus, which comprises 1,472 notes with detailed annotations characterizing COVID-19 diagnoses, testing, and clinical presentation. We introduce a span-based event extraction model that jointly extracts all annotated phenomena, achieving high performance in identifying COVID-19 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
