BIOptimus: Pre-training an Optimal Biomedical Language Model with Curriculum Learning for Named Entity Recognition
Pavlova Vera, Mohammed Makhlouf

TL;DR
This paper introduces BIOptimus, a novel biomedical language model trained with curriculum learning and weight distillation, achieving state-of-the-art results on biomedical NER tasks by optimizing pre-training strategies.
Contribution
The paper proposes a new pre-training method combining curriculum learning and weight distillation, improving biomedical NER performance and pre-training efficiency.
Findings
BIOptimus outperforms existing biomedical LMs on NER tasks.
Pre-training with curriculum learning enhances model performance.
Weight distillation accelerates pre-training and boosts accuracy.
Abstract
Using language models (LMs) pre-trained in a self-supervised setting on large corpora and then fine-tuning for a downstream task has helped to deal with the problem of limited label data for supervised learning tasks such as Named Entity Recognition (NER). Recent research in biomedical language processing has offered a number of biomedical LMs pre-trained using different methods and techniques that advance results on many BioNLP tasks, including NER. However, there is still a lack of a comprehensive comparison of pre-training approaches that would work more optimally in the biomedical domain. This paper aims to investigate different pre-training methods, such as pre-training the biomedical LM from scratch and pre-training it in a continued fashion. We compare existing methods with our proposed pre-training method of initializing weights for new tokens by distilling existing weights from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Attention Dropout · Linear Layer · Layer Normalization · Residual Connection · Dense Connections · Softmax · Weight Decay
