Overcoming the Stability Gap in Continual Learning

Md Yousuf Harun; Christopher Kanan

arXiv:2306.01904·cs.CV·September 18, 2024·2 cites

Overcoming the Stability Gap in Continual Learning

Md Yousuf Harun, Christopher Kanan

PDF

Open Access 3 Reviews

TL;DR

This paper investigates the stability gap in continual learning for large pre-trained neural networks, proposing a method to reduce this gap and improve computational efficiency in real-world applications.

Contribution

The paper identifies the stability gap as a key obstacle in continual learning and introduces a novel method that significantly reduces this gap in large-scale experiments.

Findings

01

The proposed method reduces the stability gap in class incremental learning.

02

It greatly improves computational efficiency in large-scale continual learning.

03

The approach aligns continual learning with practical production needs.

Abstract

Pre-trained deep neural networks (DNNs) are being widely deployed by industry for making business decisions and to serve users; however, a major problem is model decay, where the DNN's predictions become more erroneous over time, resulting in revenue loss or unhappy users. To mitigate model decay, DNNs are retrained from scratch using old and new data. This is computationally expensive, so retraining happens only once performance significantly decreases. Here, we study how continual learning (CL) could potentially overcome model decay in large pre-trained DNNs and greatly reduce computational costs for keeping DNNs up-to-date. We identify the "stability gap" as a major obstacle in our setting. The stability gap refers to a phenomenon where learning new data causes large drops in performance for past tasks before CL mitigation methods eventually compensate for this drop. We test two…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. Stability gap phenomenon is a recently discovered. Mitigating it can play a crucial role in enhancing the efficiency of continual learning. 2. It propose new metrics, which differ slightly from those proposed by De Lange et al., to measure the stability gap in class incremental learning. 3. It offers extensive ablation studies to facilitate a comprehensive understanding of the effect on each compoenent.

Weaknesses

1. The proposed method is evaluated in constrained settings. The network is ConvNeXtV1-Tiny, which is seldomly used in previous continual learning literature. The applicability of the proposed method to other network architectures, such as ResNet or simple CNNs, remains unclear. Additionally, the buffer size is much larger than the one used in previous literature. 2. The effectiveness in reducing the stability gap depends heavily on the utilization of a pretraining model from ImageNet-1K. As sh

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 5

Strengths

1. It's good to see the author's efforts to address a recently proposed stability gap issue for continual learning 2. The writing is relatively clear and easy to follow 3. The proposed three metrics to measure the stability gap seem novel.

Weaknesses

1. The extremely relaxed setting for continual learning limits the contribution of the proposed method. For this setting, the continual learner has a strong pre-trained model and can access all prior source data for data replay(e.g., ImageNet-1K) which is unusual in continual learning. 1.1 Such a setting narrows the proposed hypothesis for continual learning with the pre-trained model. Meanwhile, the proposed techniques are specially designed for that. With the pre-trained model, some of

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

- The exploration of the stability gap is a compelling avenue that merits further exploration. - The authors have undertaken thorough experiments to substantiate their contributions.

Weaknesses

- In section 3.1 (second paragraph), the rationale behind setting f=0.3 is not clearly elucidated. A more detailed explanation in the experimental section would be beneficial. - While the main experiments utilize ConvNeXtV2-Femto, which has undergone unsupervised pertaining on ImageNet1k followed by supervised fine-tuning, it would be advantageous to also present results for training ResNet18 from scratch. This is important given the common use of ResNet18 in the CL community. - Table 1 could be

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsTest