FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning

Yujie Feng; Hao Wang; Jian Li; Xu Chu; Zhaolu Kang; Yiran Liu; Yasha Wang; Philip S. Yu; Xiao-Ming Wu

arXiv:2601.03938·cs.LG·April 21, 2026

FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning

Yujie Feng, Hao Wang, Jian Li, Xu Chu, Zhaolu Kang, Yiran Liu, Yasha Wang, Philip S. Yu, Xiao-Ming Wu

PDF

TL;DR

FOREVER introduces a memory replay framework for language models that aligns replay timing with the model's internal learning progress, inspired by human forgetting curves, to reduce catastrophic forgetting.

Contribution

It proposes a novel model-centric replay scheduling method based on forgetting curves, improving continual learning performance for large language models.

Findings

01

FOREVER outperforms existing methods on three benchmarks.

02

It effectively reduces catastrophic forgetting across models from 0.6B to 13B parameters.

03

The approach adapts replay intervals based on model evolution rather than fixed steps.

Abstract

Continual learning (CL) for large language models (LLMs) aims to enable sequential knowledge acquisition without catastrophic forgetting. Memory replay methods are widely used for their practicality and effectiveness, but most rely on fixed, step-based heuristics that often misalign with the model's actual learning progress, since identical training steps can result in varying degrees of parameter change. Motivated by recent findings that LLM forgetting mirrors the Ebbinghaus human forgetting curve, we propose FOREVER (FORgEtting curVe-inspired mEmory Replay), a novel CL framework that aligns replay schedules with a model-centric notion of time. FOREVER defines model time using the magnitude of optimizer updates, allowing forgetting curve-inspired replay intervals to align with the model's internal evolution rather than raw training steps. Building on this approach, FOREVER incorporates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.