Collaborative Parameter Learning: Mitigating Forgetting via Parameter-Level Gradient Analysis

Mutian Yang; Zisen Zhan; Yutong Chen; Haolin Li; Kaiwen Wang; Kaili Zheng; Yuguang Wang; Qi Wang; Jiandong Gao; Ji Wu

arXiv:2601.21577·cs.LG·May 14, 2026

Collaborative Parameter Learning: Mitigating Forgetting via Parameter-Level Gradient Analysis

Mutian Yang, Zisen Zhan, Yutong Chen, Haolin Li, Kaiwen Wang, Kaili Zheng, Yuguang Wang, Qi Wang, Jiandong Gao, Ji Wu

PDF

TL;DR

This paper introduces Collaborative Parameter Learning (CPL), a method that selectively updates parameters to reduce catastrophic forgetting in large language models, achieving significant improvements across various tasks.

Contribution

It decomposes gradient similarity into parameter-wise contributions, identifying conflicting and collaborative parameters, and proposes CPL to freeze conflicting ones while updating collaborative ones.

Findings

01

CPL learns 20.2% to 48.2% more questions with negligible forgetting.

02

CPL reduces peak VRAM by ~3 GB per billion parameters.

03

CPL decreases computation time by 16.5%.

Abstract

Catastrophic forgetting during knowledge injection impairs the ability of large language models to acquire new knowledge without overwriting previously mastered knowledge. Recent studies analyze forgetting from a gradient similarity perspective and mitigate forgetting through vector projection. However, these methods primarily characterize gradient similarity at the aggregate direction level, leaving the parameter wise contributions to forgetting underexplored. In this paper, we decompose gradient similarity into parameter wise contributions and identify two types of parameters during forgetting: Conflicting Parameters, whose updates contribute to forgetting and typically account for 50 percent to 75 percent of parameters, and Collaborative Parameters, whose updates mitigate forgetting and account for 25 percent to 50 percent. Based on this analysis, we propose Collaborative Parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.