Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study
Shubham Agarwal, Thomas Searle, Mart Ratas, Anthony Shek, James Teo,, Richard Dobson

TL;DR
This study compares various NLP models for extracting and classifying clinical event contexts from electronic health records, highlighting the superior performance of transformer-based models like BERT with class imbalance techniques.
Contribution
It provides a comprehensive comparison of NLP models for medical text classification and demonstrates the effectiveness of BERT combined with imbalance mitigation in clinical data extraction.
Findings
BERT outperforms Bi-LSTM by up to 28% in recall for minority classes.
Transformer models show superior performance in classifying clinical event contexts.
The approach is integrated into the CogStack/MedCAT framework and publicly available.
Abstract
Electronic Health Records are large repositories of valuable clinical data, with a significant portion stored in unstructured text format. This textual data includes clinical events (e.g., disorders, symptoms, findings, medications and procedures) in context that if extracted accurately at scale can unlock valuable downstream applications such as disease prediction. Using an existing Named Entity Recognition and Linking methodology, MedCAT, these identified concepts need to be further classified (contextualised) for their relevance to the patient, and their temporal and negated status for example, to be useful downstream. This study performs a comparative analysis of various natural language models for medical text classification. Extensive experimentation reveals the effectiveness of transformer-based language models, particularly BERT. When combined with class imbalance mitigation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Softmax · Linear Layer · Adam · Weight Decay · Dropout · Layer Normalization · Dense Connections · Attention Dropout
