Towards guarantees for parameter isolation in continual learning
Giulia Lanzillotta, Sidak Pal Singh, Benjamin F. Grewe, Thomas Hofmann

TL;DR
This paper investigates the problem of catastrophic forgetting in continual learning, analyzing neural network loss landscapes, and provides theoretical guarantees for parameter isolation methods to mitigate forgetting.
Contribution
It offers a unifying geometric perspective on parameter isolation algorithms and establishes provable guarantees against catastrophic forgetting.
Findings
Loss landscape geometry relates to forgetting
Guarantees established for certain parameter isolation methods
Unifying framework for continual learning algorithms
Abstract
Deep learning has proved to be a successful paradigm for solving many challenges in machine learning. However, deep neural networks fail when trained sequentially on multiple tasks, a shortcoming known as catastrophic forgetting in the continual learning literature. Despite a recent flourish of learning algorithms successfully addressing this problem, we find that provable guarantees against catastrophic forgetting are lacking. In this work, we study the relationship between learning and forgetting by looking at the geometry of neural networks' loss landscape. We offer a unifying perspective on a family of continual learning algorithms, namely methods based on parameter isolation, and we establish guarantees on catastrophic forgetting for some of them.
Peer Reviews
Decision·Submitted to ICLR 2024
- The authors construct a theoretical framework for parameter isolation strategies through the lens of loss landscape. - Several seminal continual learning methods can be analyzed and explained with their framework.
- My major concern is the significance or novelty of the submission. The framework authors formulate, i.e., analyzing null forgetting in continual learning through loss landscape, is somehow not quite new given some previous works. Besides, though the authors analyze some continual learning methods with their framework, it would be more appreciated if a new continual learning method can be proposed guided by the theoretical framework. The current presentation also makes the experimental part wea
The theoretical framework is powerful enough to include many existing continual learning techniques.
The second-order analysis is restrictive to the case when the learning rate is small.
1. It is interesting to try and theoretically explain why parameter isolation methods may work (or not work) well. Past works have focussed mostly on regularisation and memory based methods. 2. As far as I can tell (aside from one small question, listed later), the derivations of the theorems are correct / intuitively make sense to me. 3. I found the perturbation analysis technique in Section 5.1 interesting.
1. At the end of Section 1.1, the paper claims that minimising average forgetting is equivalent to minimising the multi-task loss. I do not think this is true: there is a missing L_t(\theta), ie the loss at the current task t. I don't think this was used anywhere, so this should be an easy fix. However, in Section 5, the authors say, "we report forgetting in terms of accuracy", which I did not understand. What does this mean (because forgetting is not equal to accuracy)? 2. I'm not sure if OGD
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
