The Importance of Being Lazy: Scaling Limits of Continual Learning
Jacopo Graldi, Alessandro Breccia, Giulia Lanzillotta, Thomas Hofmann, Lorenzo Noci

TL;DR
This paper investigates how model scale and feature learning influence catastrophic forgetting in neural networks during continual learning, revealing a transition between lazy and rich regimes and identifying optimal feature learning levels.
Contribution
It introduces a unified framework differentiating lazy and rich training regimes, extending theoretical understanding of continual learning dynamics across model scales.
Findings
Increasing model width benefits only when it reduces feature learning.
High feature learning leads to more forgetting in dissimilar tasks.
Optimal performance occurs at a critical level of feature learning.
Abstract
Despite recent efforts, neural networks still struggle to learn in non-stationary environments, and our understanding of catastrophic forgetting (CF) is far from complete. In this work, we perform a systematic study on the impact of model scale and the degree of feature learning in continual learning. We reconcile existing contradictory observations on scale in the literature, by differentiating between lazy and rich training regimes through a variable parameterization of the architecture. We show that increasing model width is only beneficial when it reduces the amount of feature learning, yielding more laziness. Using the framework of dynamical mean field theory, we then study the infinite width dynamics of the model in the feature learning regime and characterize CF, extending prior theoretical results limited to the lazy regime. We study the intricate relationship between feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection · Face recognition and analysis
