Understanding Forgetting in Continual Learning with Linear Regression

Meng Ding; Kaiyi Ji; Di Wang; Jinhui Xu

arXiv:2405.17583·cs.LG·May 29, 2024·1 cites

Understanding Forgetting in Continual Learning with Linear Regression

Meng Ding, Kaiyi Ji, Di Wang, Jinhui Xu

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of forgetting in continual learning using linear regression, revealing how task order and step size influence forgetting, supported by simulations on linear models and DNNs.

Contribution

It introduces a general theoretical framework for understanding forgetting in linear regression with SGD, considering task sequence and algorithm parameters.

Findings

01

Task order with larger eigenvalue tasks trained later increases forgetting.

02

Proper step size choice mitigates forgetting across regimes.

03

Simulation results validate theoretical insights.

Abstract

Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. Despite the tremendous progress made in the past, the theoretical understanding, especially factors contributing to catastrophic forgetting, remains relatively unexplored. In this paper, we provide a general theoretical analysis of forgetting in the linear regression model via Stochastic Gradient Descent (SGD) applicable to both underparameterized and overparameterized regimes. Our theoretical framework reveals some interesting insights into the intricate relationship between task sequence and algorithmic parameters, an aspect not fully captured in previous studies due to their restrictive assumptions. Specifically, we demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence, where tasks with larger eigenvalues in their population data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsLinear Regression