Students Need More Attention: BERT-based AttentionModel for Small Data   with Application to AutomaticPatient Message Triage

Shijing Si; Rui Wang; Jedrek Wosik; Hao Zhang; David Dov; Guoyin Wang,; Ricardo Henao; and Lawrence Carin

arXiv:2006.11991·cs.CL·June 23, 2020·6 cites

Students Need More Attention: BERT-based AttentionModel for Small Data with Application to AutomaticPatient Message Triage

Shijing Si, Rui Wang, Jedrek Wosik, Hao Zhang, David Dov, Guoyin Wang,, Ricardo Henao, and Lawrence Carin

PDF

Open Access 1 Repo

TL;DR

This paper introduces LESA-BERT, a novel attention mechanism with label embeddings, and demonstrates its effectiveness in small healthcare datasets for classifying patient message urgency, outperforming baselines.

Contribution

The paper proposes LESA-BERT with label embeddings for self-attention and distillation to smaller models, improving small dataset classification performance.

Findings

01

LESA-BERT outperforms baseline classifiers by 4.3% macro F1 score.

02

Distilled LESA-BERT reduces overfitting and model size.

03

Effective for small, imbalanced healthcare datasets.

Abstract

Small and imbalanced datasets commonly seen in healthcare represent a challenge when training classifiers based on deep learning models. So motivated, we propose a novel framework based on BioBERT (Bidirectional Encoder Representations from Transformers forBiomedical TextMining). Specifically, (i) we introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim to reduce overfitting and model size when working on small datasets. As an application, our framework is utilized to build a model for patient portal message triage that classifies the urgency of a message into three categories: non-urgent, medium and urgent. Experiments demonstrate that our approach can outperform several strong baseline classifiers by a significant margin of 4.3% in terms of macro F1 score. The code for this project is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shijing001/text_classifiers
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Text Readability and Simplification

MethodsLinear Layer · Weight Decay · Softmax · Adam · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay · Dense Connections