Towards guarantees for parameter isolation in continual learning

Giulia Lanzillotta; Sidak Pal Singh; Benjamin F. Grewe; Thomas Hofmann

arXiv:2310.01165·cs.LG·October 3, 2023

Towards guarantees for parameter isolation in continual learning

Giulia Lanzillotta, Sidak Pal Singh, Benjamin F. Grewe, Thomas Hofmann

PDF

Open Access 3 Reviews

TL;DR

This paper investigates the problem of catastrophic forgetting in continual learning, analyzing neural network loss landscapes, and provides theoretical guarantees for parameter isolation methods to mitigate forgetting.

Contribution

It offers a unifying geometric perspective on parameter isolation algorithms and establishes provable guarantees against catastrophic forgetting.

Findings

01

Loss landscape geometry relates to forgetting

02

Guarantees established for certain parameter isolation methods

03

Unifying framework for continual learning algorithms

Abstract

Deep learning has proved to be a successful paradigm for solving many challenges in machine learning. However, deep neural networks fail when trained sequentially on multiple tasks, a shortcoming known as catastrophic forgetting in the continual learning literature. Despite a recent flourish of learning algorithms successfully addressing this problem, we find that provable guarantees against catastrophic forgetting are lacking. In this work, we study the relationship between learning and forgetting by looking at the geometry of neural networks' loss landscape. We offer a unifying perspective on a family of continual learning algorithms, namely methods based on parameter isolation, and we establish guarantees on catastrophic forgetting for some of them.

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

- The authors construct a theoretical framework for parameter isolation strategies through the lens of loss landscape. - Several seminal continual learning methods can be analyzed and explained with their framework.

Weaknesses

- My major concern is the significance or novelty of the submission. The framework authors formulate, i.e., analyzing null forgetting in continual learning through loss landscape, is somehow not quite new given some previous works. Besides, though the authors analyze some continual learning methods with their framework, it would be more appreciated if a new continual learning method can be proposed guided by the theoretical framework. The current presentation also makes the experimental part wea

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 1

Strengths

The theoretical framework is powerful enough to include many existing continual learning techniques.

Weaknesses

The second-order analysis is restrictive to the case when the learning rate is small.

Reviewer 03Rating 1· strong rejectConfidence 5

Strengths

1. It is interesting to try and theoretically explain why parameter isolation methods may work (or not work) well. Past works have focussed mostly on regularisation and memory based methods. 2. As far as I can tell (aside from one small question, listed later), the derivations of the theorems are correct / intuitively make sense to me. 3. I found the perturbation analysis technique in Section 5.1 interesting.

Weaknesses

1. At the end of Section 1.1, the paper claims that minimising average forgetting is equivalent to minimising the multi-task loss. I do not think this is true: there is a missing L_t(\theta), ie the loss at the current task t. I don't think this was used anywhere, so this should be an easy fix. However, in Section 5, the authors say, "we report forgetting in terms of accuracy", which I did not understand. What does this mean (because forgetting is not equal to accuracy)? 2. I'm not sure if OGD

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition