On the Convergence of Continual Learning with Adaptive Methods

Seungyub Han; Yeongmo Kim; Taehyun Cho; Jungwoo Lee

arXiv:2404.05555·cs.LG·April 16, 2024·1 cites

On the Convergence of Continual Learning with Adaptive Methods

Seungyub Han, Yeongmo Kim, Taehyun Cho, Jungwoo Lee

PDF

Open Access

TL;DR

This paper analyzes the convergence behavior of continual learning, identifies degradation issues, and introduces an adaptive method that improves performance by adjusting step sizes to mitigate forgetting.

Contribution

It provides the first convergence analysis for memory-based continual learning with SGD and proposes an adaptive algorithm that enhances learning stability and performance.

Findings

01

Training current tasks degrades previous tasks over time

02

The proposed adaptive method matches SGD convergence rates when forgetting is controlled

03

Empirical results show improved performance on image classification tasks

Abstract

One of the objectives of continual learning is to prevent catastrophic forgetting in learning multiple tasks sequentially, and the existing solutions have been driven by the conceptualization of the plasticity-stability dilemma. However, the convergence of continual learning for each sequential task is less studied so far. In this paper, we provide a convergence analysis of memory-based continual learning with stochastic gradient descent and empirical evidence that training current tasks causes the cumulative degradation of previous tasks. We propose an adaptive method for nonconvex continual learning (NCCL), which adjusts step sizes of both previous and current tasks with the gradients. The proposed method can achieve the same convergence rate as the SGD method when the catastrophic forgetting term which we define in the paper is suppressed at each iteration. Further, we demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIdeological and Political Education

MethodsStochastic Gradient Descent