Exploring the Impact of Parameter Update Magnitude on Forgetting and Generalization of Continual Learning

JinLi He; Liang Bai; Xian Yang

arXiv:2602.20796·cs.LG·February 25, 2026

Exploring the Impact of Parameter Update Magnitude on Forgetting and Generalization of Continual Learning

JinLi He, Liang Bai, Xian Yang

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of how parameter update magnitude affects forgetting and generalization in continual learning, proposing a hybrid update strategy that improves performance.

Contribution

It introduces a formal framework linking update magnitude to knowledge degradation and develops an adaptive hybrid update method based on theoretical insights.

Findings

01

Optimal update magnitude minimizes forgetting.

02

Small parameter distances lead to better generalization.

03

Hybrid update strategy outperforms standard methods.

Abstract

The magnitude of parameter updates are considered a key factor in continual learning. However, most existing studies focus on designing diverse update strategies, while a theoretical understanding of the underlying mechanisms remains limited. Therefore, we characterize model's forgetting from the perspective of parameter update magnitude and formalize it as knowledge degradation induced by task-specific drift in the parameter space, which has not been fully captured in previous studies due to their assumption of a unified parameter space. By deriving the optimal parameter update magnitude that minimizes forgetting, we unify two representative update paradigms, frozen training and initialized training, within an optimization framework for constrained parameter updates. Our theoretical results further reveals that sequence tasks with small parameter distances exhibit better generalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection · Face recognition and analysis