BERT WEAVER: Using WEight AVERaging to enable lifelong learning for transformer-based models in biomedical semantic search engines
Lisa K\"uhnel, Alexander Schulz, Barbara Hammer, Juliane Fluck

TL;DR
BERT WEAVER introduces a weight averaging technique that enables transformer models to learn continuously from new biomedical data without forgetting previous knowledge, enhancing lifelong learning capabilities.
Contribution
The paper proposes WEAVER, a simple post-processing method for lifelong learning in transformer models, reducing catastrophic forgetting without retraining from scratch.
Findings
WEAVER achieves similar performance to combined training on all data.
The method is computationally efficient compared to full retraining.
Applicable to federated learning scenarios in biomedical data.
Abstract
Recent developments in transfer learning have boosted the advancements in natural language processing tasks. The performance is, however, dependent on high-quality, manually annotated training data. Especially in the biomedical domain, it has been shown that one training corpus is not enough to learn generic models that are able to efficiently predict on new data. Therefore, in order to be used in real world applications state-of-the-art models need the ability of lifelong learning to improve performance as soon as new data are available - without the need of re-training the whole model from scratch. We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model, thereby reducing catastrophic forgetting. We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare
