HyperMask: Adaptive Hypernetwork-based Masks for Continual Learning

Kamil Ksi\k{a}\.zek; Przemys{\l}aw Spurek

arXiv:2310.00113·cs.LG·May 27, 2024

HyperMask: Adaptive Hypernetwork-based Masks for Continual Learning

Kamil Ksi\k{a}\.zek, Przemys{\l}aw Spurek

PDF

Open Access 1 Repo 3 Reviews

TL;DR

HyperMask introduces a hypernetwork-based method that dynamically generates task-specific sparse subnetworks for continual learning, effectively mitigating catastrophic forgetting and achieving state-of-the-art results.

Contribution

The paper proposes HyperMask, a novel approach using semi-binary masks and the lottery ticket hypothesis to create adaptive, task-specific subnetworks within a single network for continual learning.

Findings

01

HyperMask achieves competitive results on multiple CL datasets.

02

It surpasses state-of-the-art scores in some scenarios.

03

The method effectively mitigates catastrophic forgetting.

Abstract

Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. Many continual learning (CL) strategies are trying to overcome this problem. One of the most effective is the hypernetwork-based approach. The hypernetwork generates the weights of a target model based on the task's identity. The model's main limitation is that, in practice, the hypernetwork can produce completely different architectures for subsequent tasks. To solve such a problem, we use the lottery ticket hypothesis, which postulates the existence of sparse subnetworks, named winning tickets, that preserve the performance of a whole network. In the paper, we propose a method called HyperMask, which dynamically filters a target network depending on the CL task. The hypernetwork produces semi-binary masks to obtain dedicated target subnetworks. Moreover, due to the…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 5

Strengths

(1) The hypernetworks are used for continual learning in a different way of generating hypermasks for each task. (2) It connects to lottery ticket theory with a single network for continual learning. (3) The method is easy to follow in general.

Weaknesses

(1) There are some works mentioned in the related work using masks as an extension of the whole network, it is unclear what benefits hypernetwork can bring. (2) There are several common loss functions are used in the method, and it is unclear if the improvements are from the proposed hypernetworks or the additional regularizations. There is no ablation study on these components. (3) The experimental evaluation is very limited. It only compares with other methods on very tiny benchmarks. The perf

Reviewer 02Rating 3· reject, not good enoughConfidence 4

Strengths

1. The paper introduces an innovative approach, HyperMask, that combines the concepts of hypernetworks and the lottery ticket hypothesis. This unique combination leads to a novel method for addressing catastrophic forgetting in continual learning. 2. The idea of using semi-binary masks generated by a hypernetwork to create target subnetworks is a fresh and creative approach to tackling the challenges of continual learning. 3. The paper demonstrates a high level of quality in the experimental e

Weaknesses

1.The primary contribution of HyperMask, which involves using hypernetworks to produce semi-binary masks for continual learning, may not be considered highly novel in the field of continual learning and neural network architectures. Hypernetworks have been explored in prior research as a means to generate task-specific weights for neural networks [1][2], and the concept of using masks or pruning for model adaptation is not entirely new. 2. The paper lacks a deeper theoretical analysis of the pro

Reviewer 03Rating 3· reject, not good enoughConfidence 5

Strengths

- The method itself is technically reasonable and the details of the method are well described in the paper.

Weaknesses

- Baselines suggested in this paper are relatively outdated. I understand that incremental architecture methods are no longer dominating this field of research but it does not mean that they can ignore regularization or representation based methods. For example, FeCAM [1] does not even need to know the task index and it still outperforms the author's method. - The amount of experimental result is severely insufficient. According to the paper, HyperMask is superior to other methods only in Split-

Code & Models

Repositories

gmum/hypermask
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovative Teaching and Learning Methods · Online and Blended Learning · Intelligent Tutoring Systems and Adaptive Learning

MethodsHyperNetwork