How catastrophic can catastrophic forgetting be in linear regression?
Itay Evron, Edward Moroshko, Rachel Ward, Nati Srebro, Daniel Soudry

TL;DR
This paper analyzes the extent of catastrophic forgetting in overparameterized linear models across sequential tasks, providing exact bounds and revealing connections to classical algorithms like the Kaczmarz method.
Contribution
It offers the first exact expressions and bounds for forgetting in linear models, linking continual learning to alternating projections and the Kaczmarz method, and explores effects of task ordering.
Findings
Upper bound of T^2 * min{1/√k, d/k} on forgetting in cyclic task presentation
Forgetting can be significantly reduced with random task ordering
Contrasts between forgetting and convergence to offline solutions in linear models
Abstract
To better understand catastrophic forgetting, we study fitting an overparameterized linear model to a sequence of tasks with different input distributions. We analyze how much the model forgets the true labels of earlier tasks after training on subsequent tasks, obtaining exact expressions and bounds. We establish connections between continual learning in the linear setting and two other research areas: alternating projections and the Kaczmarz method. In specific settings, we highlight differences between forgetting and convergence to the offline solution as studied in those areas. In particular, when T tasks in d dimensions are presented cyclically for k iterations, we prove an upper bound of T^2 * min{1/sqrt(k), d/k} on the forgetting. This stands in contrast to the convergence to the offline solution, which can be arbitrarily slow according to existing alternating projection results.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques
