How to Alleviate Catastrophic Forgetting in LLMs Finetuning? Hierarchical Layer-Wise and Element-Wise Regularization
Shezheng Song, Hao Xu, Jun Ma, Shasha Li, Long Peng, Qian Wan,, Xiaodong Liu, Jie Yu

TL;DR
This paper introduces a hierarchical layer-wise and element-wise regularization method to mitigate catastrophic forgetting in LLM fine-tuning, improving knowledge retention and task adaptation efficiently.
Contribution
It proposes a novel element-wise importance computation and dynamic layer-wise coefficients for effective regularization during fine-tuning.
Findings
Reduces catastrophic forgetting in LLMs by 20 times faster than previous methods.
Requires only 10-15% of storage compared to existing solutions.
Enhances model adaptability across scientific, medical, and physical tasks.
Abstract
Large Language Models (LLMs) exhibit strong general language capabilities. However, fine-tuning these models on domain-specific tasks often leads to catastrophic forgetting, where the model overwrites or loses essential knowledge acquired during pretraining. This phenomenon significantly limits the broader applicability of LLMs. To address this challenge, we propose a novel approach to compute the element-wise importance of model parameters crucial for preserving general knowledge during fine-tuning. Our method utilizes a dual-objective optimization strategy: (1) regularization loss based on element-wise parameter importance, which constrains the updates to parameters crucial for general knowledge; (2) cross-entropy loss to adapt to domain-specific tasks. Additionally, we introduce layer-wise coefficients to account for the varying contributions of different layers, dynamically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
