Unlocking Continual Learning Abilities in Language Models

Wenyu Du; Shuang Cheng; Tongxu Luo; Zihan Qiu; Zeyu Huang; Ka Chun; Cheung; Reynold Cheng; Jie Fu

arXiv:2406.17245·cs.LG·October 8, 2024

Unlocking Continual Learning Abilities in Language Models

Wenyu Du, Shuang Cheng, Tongxu Luo, Zihan Qiu, Zeyu Huang, Ka Chun, Cheung, Reynold Cheng, Jie Fu

PDF

Open Access 1 Repo

TL;DR

This paper introduces MIGU, a novel rehearsal-free, task-label-free continual learning method for language models that leverages output magnitude distributions to prevent catastrophic forgetting, achieving state-of-the-art results.

Contribution

MIGU is the first method to utilize output magnitude distributions for continual learning in LMs without requiring old data or task labels.

Findings

01

MIGU improves average accuracy by 15.2% over baselines.

02

It is applicable to T5, RoBERTa, and Llama2 architectures.

03

MIGU enhances performance across multiple CL benchmarks.

Abstract

Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task information are often unavailable or costly to collect, hindering the availability of current CL approaches for LMs. To address this limitation, we introduce $MIGU$ ( $M$ agn $I$ tude-based $G$ radient $U$ pdating for continual learning), a rehearsal-free and task-label-free method that only updates the model parameters with large magnitudes of output in LMs' linear layers. MIGU is based on our observation that the L1-normalized magnitude distribution of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenyudu/migu
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dense Connections · Weight Decay · Residual Connection · Multi-Head Attention · WordPiece · Softmax · Layer Normalization