GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay

Yunan Zhang; Shuoran Jiang; Mengchen Zhao; Yuefeng Li; Yang Fan; Xiangping Wu; Qingcai Chen

arXiv:2508.04676·cs.CL·August 7, 2025

GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay

Yunan Zhang, Shuoran Jiang, Mengchen Zhao, Yuefeng Li, Yang Fan, Xiangping Wu, Qingcai Chen

PDF

TL;DR

This paper introduces GeRe, a simple and effective replay-based framework using general samples and activation state constraints to prevent catastrophic forgetting in continual learning of large language models, improving retention and performance.

Contribution

The paper proposes GeRe, a novel replay framework utilizing pretraining texts and activation state constraints, demonstrating its effectiveness in continual LLM learning.

Findings

01

TM loss improves performance and robustness

02

Small fixed set of samples suffices for anti-forgetting

03

Activation state constraints enhance replay stability

Abstract

The continual learning capability of large language models (LLMs) is crucial for advancing artificial general intelligence. However, continual fine-tuning LLMs across various domains often suffers from catastrophic forgetting, characterized by: 1) significant forgetting of their general capabilities, and 2) sharp performance declines in previously learned tasks. To simultaneously address both issues in a simple yet stable manner, we propose General Sample Replay (GeRe), a framework that use usual pretraining texts for efficient anti-forgetting. Beyond revisiting the most prevalent replay-based practices under GeRe, we further leverage neural states to introduce a enhanced activation states constrained optimization method using threshold-based margin (TM) loss, which maintains activation state consistency during replay learning. We are the first to validate that a small, fixed set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.