An Empirical Investigation Towards Efficient Multi-Domain Language Model   Pre-training

Kristjan Arumae; Qing Sun; and Parminder Bhatia

arXiv:2010.00784·cs.CL·October 5, 2020

An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training

Kristjan Arumae, Qing Sun, and Parminder Bhatia

PDF

1 Repo

TL;DR

This paper empirically investigates methods to mitigate catastrophic forgetting during multi-domain pre-training of large language models, showing elastic weight consolidation as the most effective approach with minimal performance loss.

Contribution

It systematically evaluates known techniques for reducing catastrophic forgetting in multi-domain language model pre-training, highlighting elastic weight consolidation's effectiveness.

Findings

01

Elastic weight consolidation yields minimal performance drop (0.33%) on generic tasks.

02

EWC remains competitive on bio-medical tasks.

03

Gradient and latent clustering improve data coverage in mitigation methods.

Abstract

Pre-training large language models has become a standard in the natural language processing community. Such models are pre-trained on generic data (e.g. BookCorpus and English Wikipedia) and often fine-tuned on tasks in the same domain. However, in order to achieve state-of-the-art performance on out of domain tasks such as clinical named entity recognition and relation extraction, additional in domain pre-training is required. In practice, staged multi-domain pre-training presents performance deterioration in the form of catastrophic forgetting (CF) when evaluated on a generic benchmark such as GLUE. In this paper we conduct an empirical investigation into known methods to mitigate CF. We find that elastic weight consolidation provides best overall scores yielding only a 0.33% drop in performance across seven generic tasks while remaining competitive in bio-medical tasks. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aws-health-ai/multi_domain_lm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsExperience Replay