GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay
Yunan Zhang, Shuoran Jiang, Mengchen Zhao, Yuefeng Li, Yang Fan, Xiangping Wu, Qingcai Chen

TL;DR
This paper introduces GeRe, a simple and effective replay-based framework using general samples and activation state constraints to prevent catastrophic forgetting in continual learning of large language models, improving retention and performance.
Contribution
The paper proposes GeRe, a novel replay framework utilizing pretraining texts and activation state constraints, demonstrating its effectiveness in continual LLM learning.
Findings
TM loss improves performance and robustness
Small fixed set of samples suffices for anti-forgetting
Activation state constraints enhance replay stability
Abstract
The continual learning capability of large language models (LLMs) is crucial for advancing artificial general intelligence. However, continual fine-tuning LLMs across various domains often suffers from catastrophic forgetting, characterized by: 1) significant forgetting of their general capabilities, and 2) sharp performance declines in previously learned tasks. To simultaneously address both issues in a simple yet stable manner, we propose General Sample Replay (GeRe), a framework that use usual pretraining texts for efficient anti-forgetting. Beyond revisiting the most prevalent replay-based practices under GeRe, we further leverage neural states to introduce a enhanced activation states constrained optimization method using threshold-based margin (TM) loss, which maintains activation state consistency during replay learning. We are the first to validate that a small, fixed set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
