How do Quadratic Regularizers Prevent Catastrophic Forgetting: The Role of Interpolation
Ekdeep Singh Lubana, Puja Trivedi, Danai Koutra, Robert P. Dick

TL;DR
This paper explains how quadratic regularizers prevent catastrophic forgetting in neural networks by interpolating model parameters, and proposes modifications to improve their stability and effectiveness, validated through extensive experiments.
Contribution
It provides a detailed explanation of quadratic regularizers' role in preventing forgetting and introduces a simple modification to enhance their performance and stability.
Findings
Quadratic regularizers interpolate parameters to prevent forgetting.
The modification improves accuracy by 6.2% and reduces forgetting by 4.5%.
Results are validated across 2000 models in various settings.
Abstract
Catastrophic forgetting undermines the effectiveness of deep neural networks (DNNs) in scenarios such as continual learning and lifelong learning. While several methods have been proposed to tackle this problem, there is limited work explaining why these methods work well. This paper has the goal of better explaining a popularly used technique for avoiding catastrophic forgetting: quadratic regularization. We show that quadratic regularizers prevent forgetting of past tasks by interpolating current and previous values of model parameters at every training iteration. Over multiple training iterations, this interpolation operation reduces the learning rates of more important model parameters, thereby minimizing their movement. Our analysis also reveals two drawbacks of quadratic regularization: (a) dependence of parameter interpolation on training hyperparameters, which often leads to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
