Understanding Forgetting in Continual Learning with Linear Regression
Meng Ding, Kaiyi Ji, Di Wang, Jinhui Xu

TL;DR
This paper provides a theoretical analysis of forgetting in continual learning using linear regression, revealing how task order and step size influence forgetting, supported by simulations on linear models and DNNs.
Contribution
It introduces a general theoretical framework for understanding forgetting in linear regression with SGD, considering task sequence and algorithm parameters.
Findings
Task order with larger eigenvalue tasks trained later increases forgetting.
Proper step size choice mitigates forgetting across regimes.
Simulation results validate theoretical insights.
Abstract
Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. Despite the tremendous progress made in the past, the theoretical understanding, especially factors contributing to catastrophic forgetting, remains relatively unexplored. In this paper, we provide a general theoretical analysis of forgetting in the linear regression model via Stochastic Gradient Descent (SGD) applicable to both underparameterized and overparameterized regimes. Our theoretical framework reveals some interesting insights into the intricate relationship between task sequence and algorithmic parameters, an aspect not fully captured in previous studies due to their restrictive assumptions. Specifically, we demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence, where tasks with larger eigenvalues in their population data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsLinear Regression
