Collaborative Parameter Learning: Mitigating Forgetting via Parameter-Level Gradient Analysis
Mutian Yang, Zisen Zhan, Yutong Chen, Haolin Li, Kaiwen Wang, Kaili Zheng, Yuguang Wang, Qi Wang, Jiandong Gao, Ji Wu

TL;DR
This paper introduces Collaborative Parameter Learning (CPL), a method that selectively updates parameters to reduce catastrophic forgetting in large language models, achieving significant improvements across various tasks.
Contribution
It decomposes gradient similarity into parameter-wise contributions, identifying conflicting and collaborative parameters, and proposes CPL to freeze conflicting ones while updating collaborative ones.
Findings
CPL learns 20.2% to 48.2% more questions with negligible forgetting.
CPL reduces peak VRAM by ~3 GB per billion parameters.
CPL decreases computation time by 16.5%.
Abstract
Catastrophic forgetting during knowledge injection impairs the ability of large language models to acquire new knowledge without overwriting previously mastered knowledge. Recent studies analyze forgetting from a gradient similarity perspective and mitigate forgetting through vector projection. However, these methods primarily characterize gradient similarity at the aggregate direction level, leaving the parameter wise contributions to forgetting underexplored. In this paper, we decompose gradient similarity into parameter wise contributions and identify two types of parameters during forgetting: Conflicting Parameters, whose updates contribute to forgetting and typically account for 50 percent to 75 percent of parameters, and Collaborative Parameters, whose updates mitigate forgetting and account for 25 percent to 50 percent. Based on this analysis, we propose Collaborative Parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
