MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language   Understanding Pretraining

Zhi Wen; Xing Han Lu; Siva Reddy

arXiv:2012.13978·cs.CL·December 29, 2020

MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining

Zhi Wen, Xing Han Lu, Siva Reddy

PDF

1 Repo

TL;DR

MeDAL is a large medical text dataset created for abbreviation disambiguation, which improves NLP model performance and training efficiency in medical applications.

Contribution

This work introduces MeDAL, a new dataset for medical abbreviation disambiguation, and demonstrates its effectiveness in pre-training models for better downstream medical NLP tasks.

Findings

01

Pre-training on MeDAL improves model performance on medical tasks.

02

Pre-training accelerates convergence during fine-tuning.

03

Models trained on MeDAL outperform baseline models.

Abstract

One of the biggest challenges that prohibit the use of many current NLP methods in clinical settings is the availability of public datasets. In this work, we present MeDAL, a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain. We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcGill-NLP/medal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Residual Connection · Layer Normalization · Attention Is All You Need · Multi-Head Attention · Dense Connections · Weight Decay · Linear Warmup With Linear Decay