Towards Robust and Efficient Continual Language Learning
Adam Fisch, Amal Rannen-Triki, Razvan Pascanu, J\"org Bornschein,, Angeliki Lazaridou, Elena Gribovskaya, Marc'Aurelio Ranzato

TL;DR
This paper introduces a new benchmark for continual language learning to evaluate models' ability to transfer knowledge positively while avoiding negative transfer, proposing a selective initialization strategy for improved adaptation.
Contribution
It presents a comprehensive benchmark for various transfer scenarios and proposes a simple, effective method leveraging selective checkpoint initialization for continual learning.
Findings
Benchmark covers diverse transfer scenarios.
Selective initialization improves transfer effectiveness.
Framework helps analyze positive and negative transfer in language models.
Abstract
As the application space of language models continues to evolve, a natural question to ask is how we can quickly adapt models to new tasks. We approach this classic question from a continual learning perspective, in which we aim to continue fine-tuning models trained on past tasks on new tasks, with the goal of "transferring" relevant knowledge. However, this strategy also runs the risk of doing more harm than good, i.e., negative transfer. In this paper, we construct a new benchmark of task sequences that target different possible transfer scenarios one might face, such as a sequence of tasks with high potential of positive transfer, high potential for negative transfer, no expected effect, or a mixture of each. An ideal learner should be able to maximally exploit information from all tasks that have any potential for positive transfer, while also avoiding the negative effects of any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling
