A Closer Look at Rehearsal-Free Continual Learning
James Seale Smith, Junjiao Tian, Shaunak Halbe, Yen-Chang Hsu, Zsolt, Kira

TL;DR
This paper investigates rehearsal-free continual learning, demonstrating that parameter regularization, especially L2 regularization, can effectively prevent catastrophic forgetting without data rehearsal, outperforming other methods on benchmarks.
Contribution
It challenges the belief that parameter regularization fails in rehearsal-free continual learning and shows L2 regularization's effectiveness, especially within ViT transformers on ImageNet-R.
Findings
L2 parameter regularization outperforms EWC and feature distillation.
Rehearsal-free methods can achieve strong performance without data rehearsal.
L2 regularization in self-attention blocks improves continual learning results.
Abstract
Continual learning is a setting where machine learning models learn novel concepts from continuously shifting training data, while simultaneously avoiding degradation of knowledge on previously seen classes which may disappear from the training data for extended periods of time (a phenomenon known as the catastrophic forgetting problem). Current approaches for continual learning of a single expanding task (aka class-incremental continual learning) require extensive rehearsal of previously seen data to avoid this degradation of knowledge. Unfortunately, rehearsal comes at a cost to memory, and it may also violate data-privacy. Instead, we explore combining knowledge distillation and parameter regularization in new ways to achieve strong continual learning performance without rehearsal. Specifically, we take a deep dive into common continual learning techniques: prediction distillation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsElastic Weight Consolidation · Knowledge Distillation
