On the Effectiveness of Compact Biomedical Transformers
Omid Rohanian, Mohammadmahdi Nouriborji, Samaneh Kouchaki, David A., Clifton

TL;DR
This paper introduces six lightweight biomedical transformer models created through knowledge distillation and continual learning, achieving comparable performance to larger models while significantly reducing resource requirements.
Contribution
The paper presents six novel compact biomedical transformers that match larger models' performance, developed via knowledge distillation and continual learning techniques.
Findings
Models perform on par with BioBERT-v1.1 on biomedical tasks.
All models are publicly available for research and practical use.
Significant reduction in computational resources needed for biomedical NLP.
Abstract
Language models pre-trained on biomedical corpora, such as BioBERT, have recently shown promising results on downstream biomedical tasks. Many existing pre-trained models, on the other hand, are resource-intensive and computationally heavy owing to factors such as embedding size, hidden dimension, and number of layers. The natural language processing (NLP) community has developed numerous strategies to compress these models utilising techniques such as pruning, quantisation, and knowledge distillation, resulting in models that are considerably faster, smaller, and subsequently easier to use in practice. By the same token, in this paper we introduce six lightweight models, namely, BioDistilBERT, BioTinyBERT, BioMobileBERT, DistilBioBERT, TinyBioBERT, and CompactBioBERT which are obtained either by knowledge distillation from a biomedical teacher or continual learning on the Pubmed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Radiomics and Machine Learning in Medical Imaging
MethodsKnowledge Distillation
