Learning Continually by Spectral Regularization
Alex Lewandowski, Micha{\l} Bortkiewicz, Saurabh Kumar, Andr\'as, Gy\"orgy, Dale Schuurmans, Mateusz Ostaszewski, Marlos C. Machado

TL;DR
This paper introduces a spectral regularizer that maintains neural network trainability during continual learning by controlling singular values, leading to improved performance and robustness across tasks.
Contribution
A novel spectral regularizer inspired by singular value properties at initialization, enhancing continual learning by preserving trainability and performance.
Findings
Spectral regularization sustains trainability across tasks.
It maintains gradient diversity during training.
It improves generalization in continual learning settings.
Abstract
Loss of plasticity is a phenomenon where neural networks can become more difficult to train over the course of learning. Continual learning algorithms seek to mitigate this effect by sustaining good performance while maintaining network trainability. We develop a new technique for improving continual learning inspired by the observation that the singular values of the neural network parameters at initialization are an important factor for trainability during early phases of learning. From this perspective, we derive a new spectral regularizer for continual learning that better sustains these beneficial initialization properties throughout training. In particular, the regularizer keeps the maximum singular value of each layer close to one. Spectral regularization directly ensures that gradient diversity is maintained throughout training, which promotes continual trainability, while…
Peer Reviews
Decision·ICLR 2025 Poster
- The continual learning topic is important area and loss of plasticity is fundamental challenge. - The approach is interesting and the idea to regularize the maximum singular value of each layer close to one seems novel. - The presentation of the work is nice, while the motivation is appropriate. - The authors present results for several experiments, evaluating different properties.
- It appears like a marginal improvement and novelty with respect to L2 regularization. - It seems that there is a connection to L2 regularization, or L2 regularization towards initialization of parameters which is a sort of spectral regularization. - Deeper understanding/analysis of the proposed and the one above seems important and might be beneficial, but I was not able to find it in the paper L2 norm vs spectral norm in this context. - In the experimental section, looking at the graph and
This paper provide the spectral regularize to the objective function for improving the stability of the problem.
However, the added spectral regularize would bring in the cost of computation for the continual learning if facing large model and large data.
- Section 1 Introduction and Section 2 Problem Setting are well written with clarity and detail, effectively introducing the problem, the spectral regularization algorithm, and some intuition behind its formulation. - The spectral analysis of continual learning well motivates the algorithm. - The experiments consider a wide swathe of existing continual learning algorithms, both ResNet-18 and Vision Transformer architectures, and a variety of datasets and non-stationarities. Over this set of exp
- Although spectral regularization is well motivated and the paper includes positive empirical results, one way to improve the paper would be to derive some theory illustrating the benefit of spectral regularization, and drawbacks of other regularization schemes, in continual learning.
Videos
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition
