Does Continual Learning Equally Forget All Parameters?
Haiyan Zhao, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang

TL;DR
This paper investigates which neural network modules are more prone to forgetting in continual learning, proposing efficient finetuning methods that improve performance and reduce computational costs by focusing on task-specific modules.
Contribution
It introduces a novel analysis of module-specific forgetting, and proposes the FPF and k-FPF methods that enhance continual learning efficiency and accuracy.
Findings
Finetuning task-specific modules improves forgetting mitigation.
k-FPF achieves comparable performance to FPF with less computation.
The proposed methods outperform state-of-the-art continual learning approaches.
Abstract
Distribution shift (e.g., task or domain shift) in continual learning (CL) usually results in catastrophic forgetting of neural networks. Although it can be alleviated by repeatedly replaying buffered data, the every-step replay is time-consuming. In this paper, we study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL. Our proposed metrics show that only a few modules are more task-specific and sensitively alter between tasks, while others can be shared across tasks as common knowledge. Hence, we attribute forgetting mainly to the former and find that finetuning them only on a small buffer at the end of any CL method can bring non-trivial improvement. Due to the small number of finetuned parameters, such ``Forgetting Prioritized Finetuning (FPF)'' is efficient in computation. We further propose a more efficient and simpler…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
