Exploring the Impact of Parameter Update Magnitude on Forgetting and Generalization of Continual Learning
JinLi He, Liang Bai, Xian Yang

TL;DR
This paper provides a theoretical analysis of how parameter update magnitude affects forgetting and generalization in continual learning, proposing a hybrid update strategy that improves performance.
Contribution
It introduces a formal framework linking update magnitude to knowledge degradation and develops an adaptive hybrid update method based on theoretical insights.
Findings
Optimal update magnitude minimizes forgetting.
Small parameter distances lead to better generalization.
Hybrid update strategy outperforms standard methods.
Abstract
The magnitude of parameter updates are considered a key factor in continual learning. However, most existing studies focus on designing diverse update strategies, while a theoretical understanding of the underlying mechanisms remains limited. Therefore, we characterize model's forgetting from the perspective of parameter update magnitude and formalize it as knowledge degradation induced by task-specific drift in the parameter space, which has not been fully captured in previous studies due to their assumption of a unified parameter space. By deriving the optimal parameter update magnitude that minimizes forgetting, we unify two representative update paradigms, frozen training and initialized training, within an optimization framework for constrained parameter updates. Our theoretical results further reveals that sequence tasks with small parameter distances exhibit better generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection · Face recognition and analysis
