Clinical-Longformer and Clinical-BigBird: Transformers for long clinical   sequences

Yikuan Li; Ramsey M. Wehbe; Faraz S. Ahmad; Hanyin Wang; Yuan Luo

arXiv:2201.11838·cs.CL·April 18, 2022·68 cites

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Yikuan Li, Ramsey M. Wehbe, Faraz S. Ahmad, Hanyin Wang, Yuan Luo

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper introduces Clinical-Longformer and Clinical-BigBird, domain-specific long-sequence transformer models that outperform existing models like ClinicalBERT in various clinical NLP tasks by efficiently handling longer text inputs.

Contribution

The paper presents two novel clinical domain-specific long sequence transformer models, Clinical-Longformer and Clinical-BigBird, pre-trained on large-scale clinical data, with improved performance over prior models.

Findings

01

Both models outperform ClinicalBERT in all downstream tasks.

02

Models effectively handle longer clinical texts up to 4096 tokens.

03

Source code and models are publicly available.

Abstract

Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longformer and BigBird, were proposed with the idea of sparse attention mechanism to reduce the memory usage from quadratic to the sequence length to a linear scale. These models extended the maximum input sequence length from 512 to 4096, which enhanced the ability of modeling long-term dependency and consequently achieved optimal results in a variety of tasks. Inspired by the success…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luoyuanlab/clinical-longformer
noneOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · How do I complain to Expedia?*ComplainByAgent · Residual Connection · WordPiece · Dense Connections · Linear Warmup With Linear Decay · Dropout