Training without training data: Improving the generalizability of   automated medical abbreviation disambiguation

Marta Skreta; Aryan Arbabi; Jixuan Wang; Michael Brudno

arXiv:1912.06174·cs.LG·December 16, 2019·5 cites

Training without training data: Improving the generalizability of automated medical abbreviation disambiguation

Marta Skreta, Aryan Arbabi, Jixuan Wang, Michael Brudno

PDF

Open Access

TL;DR

This paper introduces a novel data augmentation method and global context integration to enhance the generalizability of medical abbreviation disambiguation models trained on limited data, demonstrating significant accuracy improvements across multiple datasets.

Contribution

The paper presents a new data augmentation technique using related medical concepts and incorporates global context, improving model generalization for abbreviation disambiguation.

Findings

01

Accuracy increased by nearly 14% on CASI dataset.

02

Accuracy improved by 4% on i2b2 dataset.

03

Global context and data augmentation significantly enhance model performance.

Abstract

Abbreviation disambiguation is important for automated clinical note processing due to the frequent use of abbreviations in clinical settings. Current models for automated abbreviation disambiguation are restricted by the scarcity and imbalance of labeled training data, decreasing their generalizability to orthogonal sources. In this work we propose a novel data augmentation technique that utilizes information from related medical concepts, which improves our model's ability to generalize. Furthermore, we show that incorporating the global context information within the whole medical note (in addition to the traditional local context window), can significantly improve the model's representation for abbreviations. We train our model on a public dataset (MIMIC III) and test its performance on datasets from different sources (CASI, i2b2). Together, these two techniques boost the accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques

MethodsTest