On the Convergence of Continual Learning with Adaptive Methods
Seungyub Han, Yeongmo Kim, Taehyun Cho, Jungwoo Lee

TL;DR
This paper analyzes the convergence behavior of continual learning, identifies degradation issues, and introduces an adaptive method that improves performance by adjusting step sizes to mitigate forgetting.
Contribution
It provides the first convergence analysis for memory-based continual learning with SGD and proposes an adaptive algorithm that enhances learning stability and performance.
Findings
Training current tasks degrades previous tasks over time
The proposed adaptive method matches SGD convergence rates when forgetting is controlled
Empirical results show improved performance on image classification tasks
Abstract
One of the objectives of continual learning is to prevent catastrophic forgetting in learning multiple tasks sequentially, and the existing solutions have been driven by the conceptualization of the plasticity-stability dilemma. However, the convergence of continual learning for each sequential task is less studied so far. In this paper, we provide a convergence analysis of memory-based continual learning with stochastic gradient descent and empirical evidence that training current tasks causes the cumulative degradation of previous tasks. We propose an adaptive method for nonconvex continual learning (NCCL), which adjusts step sizes of both previous and current tasks with the gradients. The proposed method can achieve the same convergence rate as the SGD method when the catastrophic forgetting term which we define in the paper is suppressed at each iteration. Further, we demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIdeological and Political Education
MethodsStochastic Gradient Descent
