MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and   Contextualized Masked Language Model Score

Sunjae Kwon; Zonghai Yao; Harmon S. Jordan; David A. Levy; Brian; Corner; Hong Yu

arXiv:2210.05875·cs.CL·October 13, 2022

MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score

Sunjae Kwon, Zonghai Yao, Harmon S. Jordan, David A. Levy, Brian, Corner, Hong Yu

PDF

1 Repo

TL;DR

This paper introduces MedJEx, a novel NLP model for extracting medical jargon from EHR notes, leveraging Wikipedia hyperlinks and contextual language models, with improved performance demonstrated on multiple datasets.

Contribution

The paper presents a new dataset and a novel extraction model that outperforms existing methods by utilizing Wikipedia hyperlink spans and contextualized language scores.

Findings

01

MedJEx outperforms existing NLP models in medical jargon extraction.

02

Training on Wikipedia hyperlink spans improves biomedical NER benchmarks.

03

Contextualized masked language model scores enhance jargon detection.

Abstract

This paper proposes a new natural language processing (NLP) application for identifying medical jargon terms potentially difficult for patients to comprehend from electronic health record (EHR) notes. We first present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences ( $M e dJ$ ). Then, we introduce a novel medical jargon extraction ( $M e dJ E x$ ) model which has been shown to outperform existing state-of-the-art NLP models. First, MedJEx improved the overall performance when it was trained on an auxiliary Wikipedia hyperlink span dataset, where hyperlink spans provide additional Wikipedia articles to explain the spans (or terms), and then fine-tuned on the annotated MedJ data. Secondly, we found that a contextualized masked language model score was beneficial for detecting domain-specific unfamiliar jargon terms. Moreover, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mozzitastebitter/medjex
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.