Analysis of Overparameterization in Continual Learning under a Linear Model
Daniel Goldfarb, Paul Hand

TL;DR
This paper provides a theoretical analysis showing that overparameterization in linear models can reduce catastrophic forgetting in continual learning, offering insights into how model capacity impacts learning sequential tasks.
Contribution
It demonstrates analytically that overparameterization mitigates forgetting in linear models during continual learning, and establishes a risk bound relevant to double descent theory.
Findings
Overparameterization reduces catastrophic forgetting in linear models.
High overparameterization ratio leads to low-risk estimators for earlier tasks.
Provides a non-asymptotic risk bound for linear regression.
Abstract
Autonomous machine learning systems that learn many tasks in sequence are prone to the catastrophic forgetting problem. Mathematical theory is needed in order to understand the extent of forgetting during continual learning. As a foundational step towards this goal, we study continual learning and catastrophic forgetting from a theoretical perspective in the simple setting of gradient descent with no explicit algorithmic mechanism to prevent forgetting. In this setting, we analytically demonstrate that overparameterization alone can mitigate forgetting in the context of a linear regression model. We consider a two-task setting motivated by permutation tasks, and show that as the overparameterization ratio becomes sufficiently high, a model trained on both tasks in sequence results in a low-risk estimator for the first task. As part of this work, we establish a non-asymptotic bound of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment · Advanced Sensor and Control Systems · Face and Expression Recognition
MethodsLinear Regression
