Students Need More Attention: BERT-based AttentionModel for Small Data with Application to AutomaticPatient Message Triage
Shijing Si, Rui Wang, Jedrek Wosik, Hao Zhang, David Dov, Guoyin Wang,, Ricardo Henao, and Lawrence Carin

TL;DR
This paper introduces LESA-BERT, a novel attention mechanism with label embeddings, and demonstrates its effectiveness in small healthcare datasets for classifying patient message urgency, outperforming baselines.
Contribution
The paper proposes LESA-BERT with label embeddings for self-attention and distillation to smaller models, improving small dataset classification performance.
Findings
LESA-BERT outperforms baseline classifiers by 4.3% macro F1 score.
Distilled LESA-BERT reduces overfitting and model size.
Effective for small, imbalanced healthcare datasets.
Abstract
Small and imbalanced datasets commonly seen in healthcare represent a challenge when training classifiers based on deep learning models. So motivated, we propose a novel framework based on BioBERT (Bidirectional Encoder Representations from Transformers forBiomedical TextMining). Specifically, (i) we introduce Label Embeddings for Self-Attention in each layer of BERT, which we call LESA-BERT, and (ii) by distilling LESA-BERT to smaller variants, we aim to reduce overfitting and model size when working on small datasets. As an application, our framework is utilized to build a model for patient portal message triage that classifies the urgency of a message into three categories: non-urgent, medium and urgent. Experiments demonstrate that our approach can outperform several strong baseline classifiers by a significant margin of 4.3% in terms of macro F1 score. The code for this project is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Text Readability and Simplification
MethodsLinear Layer · Weight Decay · Softmax · Adam · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay · Dense Connections
