Preventing Catastrophic Forgetting in Continual Learning of New Natural   Language Tasks

Sudipta Kar; Giuseppe Castellucci; Simone Filice; Shervin Malmasi,; Oleg Rokhlenko

arXiv:2302.11074·cs.CL·February 23, 2023

Preventing Catastrophic Forgetting in Continual Learning of New Natural Language Tasks

Sudipta Kar, Giuseppe Castellucci, Simone Filice, Shervin Malmasi,, Oleg Rokhlenko

PDF

TL;DR

This paper introduces a distillation-based method to incrementally expand multi-task learning models in NLP, effectively preventing catastrophic forgetting and maintaining performance on old tasks while learning new ones.

Contribution

It proposes a novel knowledge distillation approach using unlabeled data to prevent forgetting in continual learning of NLP tasks, reducing retraining costs.

Findings

01

Prevents up to 20% performance drops on old tasks.

02

Effective in practical voice assistant scenarios.

03

Maintains high performance on new tasks.

Abstract

Multi-Task Learning (MTL) is widely-accepted in Natural Language Processing as a standard technique for learning multiple related tasks in one model. Training an MTL model requires having the training data for all tasks available at the same time. As systems usually evolve over time, (e.g., to support new functionalities), adding a new task to an existing MTL model usually requires retraining the model from scratch on all the tasks and this can be time-consuming and computationally expensive. Moreover, in some scenarios, the data used to train the original training may be no longer available, for example, due to storage or privacy concerns. In this paper, we approach the problem of incrementally expanding MTL models' capability to solve new tasks over time by distilling the knowledge of an already trained model on n tasks into a new one for solving n+1 tasks. To avoid catastrophic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.