Look-Ahead Selective Plasticity for Continual Learning of Visual Tasks

Rouzbeh Meshkinnejad; Jie Mei; Daniel Lizotte; Yalda Mohsenzadeh

arXiv:2311.01617·cs.CV·November 6, 2023·2 cites

Look-Ahead Selective Plasticity for Continual Learning of Visual Tasks

Rouzbeh Meshkinnejad, Jie Mei, Daniel Lizotte, Yalda Mohsenzadeh

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a novel continual learning method inspired by brain event models, using contrastive loss to identify and retain important network parameters during task transitions, leading to improved performance on vision benchmarks.

Contribution

It proposes a new mechanism that leverages contrastive loss at task boundaries to selectively preserve parameters, enhancing continual learning without relying on previous task data.

Findings

01

Achieves state-of-the-art results on CIFAR10 and TinyImagenet

02

Effective in task-incremental, class-incremental, and domain-incremental scenarios

03

Reduces catastrophic forgetting by selective parameter retention

Abstract

Contrastive representation learning has emerged as a promising technique for continual learning as it can learn representations that are robust to catastrophic forgetting and generalize well to unseen future tasks. Previous work in continual learning has addressed forgetting by using previous task data and trained models. Inspired by event models created and updated in the brain, we propose a new mechanism that takes place during task boundaries, i.e., when one task finishes and another starts. By observing the redundancy-inducing ability of contrastive loss on the output of a neural network, our method leverages the first few samples of the new task to identify and retain parameters contributing most to the transfer ability of the neural network, freeing up the remaining parts of the network to learn new features. We evaluate the proposed methods on benchmark computer vision datasets…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

- the argument that the proposed method improves the potential plasticity by reducing regularization on redundant dimensions of methods is nice. Measuring this on the current task data is new (rather on the previous) - the proposed method obtains decent performance gain for small memory size especially on CIFAR10.

Weaknesses

- the idea to focus on the importance of neurons for future (or current) tasks is new, many methods aim to measure the importance of neurons for previous tasks. However, the final difference between these strategies is very small (see table 2), and in my opinion too small. - I do not really like CIFAR 10 for continual learning since the tasks are really small. I would like to also see results on CIFAR100 and if possible on ImageNet-subset. - more results on the subset size should be added.

Reviewer 02Rating 3· reject, not good enoughConfidence 5

Strengths

1. Overall, the paper is easy to follow. Using the look-ahead idea to estimate the importance of the model weight seems to be interesting. 2. The author provides many experiments and analyses to valid and reason about the proposed method.

Weaknesses

1. The look-ahead idea is not totally new in continual learning. The author did not discuss the relationship between seminal work like "La-MAML: Look-ahead Meta-Learning for Continual Learning" (NeurIPS 2020) and the present work, where the La-MAML has already considered using the initial batch of data to adapt the gradient for continual learning, which in general is related to the author's proposed masked distillation training and gradient modulation. 2. It is unclear why the paper needs to s

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. The paper is basically well-written and easy to follow. 2. The idea of event models is interesting. It’s good to see the connections between task boundaries and neurological mechanisms.

Weaknesses

1. I agree that the current continual learning methods focus more on stability rather than plasticity/transfer. However, I think the technical contribution is incremental and not completely novel. The proposed method can be seen as an improved version of Co$^2$L. Also, the idea of “look-ahead” new tasks has been widely discussed in recent literature, such as learning and combing the new task solution [1] [2]. These related work should be discussed and compared (at least conceptually). 2. The pr

Code & Models

Repositories

crouzbehmeshkin/lasp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications