Unlocking Continual Learning Abilities in Language Models
Wenyu Du, Shuang Cheng, Tongxu Luo, Zihan Qiu, Zeyu Huang, Ka Chun, Cheung, Reynold Cheng, Jie Fu

TL;DR
This paper introduces MIGU, a novel rehearsal-free, task-label-free continual learning method for language models that leverages output magnitude distributions to prevent catastrophic forgetting, achieving state-of-the-art results.
Contribution
MIGU is the first method to utilize output magnitude distributions for continual learning in LMs without requiring old data or task labels.
Findings
MIGU improves average accuracy by 15.2% over baselines.
It is applicable to T5, RoBERTa, and Llama2 architectures.
MIGU enhances performance across multiple CL benchmarks.
Abstract
Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task information are often unavailable or costly to collect, hindering the availability of current CL approaches for LMs. To address this limitation, we introduce (agntude-based radient pdating for continual learning), a rehearsal-free and task-label-free method that only updates the model parameters with large magnitudes of output in LMs' linear layers. MIGU is based on our observation that the L1-normalized magnitude distribution of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dense Connections · Weight Decay · Residual Connection · Multi-Head Attention · WordPiece · Softmax · Layer Normalization
