Beyond Retention: Orchestrating Structural Safety and Plasticity in Continual Learning for LLMs

Fei Meng

arXiv:2601.18255·cs.LG·January 27, 2026

Beyond Retention: Orchestrating Structural Safety and Plasticity in Continual Learning for LLMs

Fei Meng

PDF

Open Access

TL;DR

This paper investigates the limitations of Experience Replay in continual learning for LLMs, revealing a trade-off between stability and plasticity, and proposes a novel method, OSW, to better preserve fragile knowledge while learning new tasks.

Contribution

The paper introduces Orthogonal Subspace Wake-up (OSW), a new approach that enforces orthogonal updates to preserve previous task knowledge, addressing ER's limitations in fragile domains.

Findings

01

ER improves performance on unstructured tasks but harms structured tasks like code generation.

02

OSW effectively preserves coding abilities while maintaining plasticity for new tasks.

03

Empirical results show OSW outperforms ER in diverse continual learning scenarios.

Abstract

Continual learning in Large Language Models (LLMs) faces the critical challenge of balancing stability (retaining old knowledge) and plasticity (learning new tasks). While Experience Replay (ER) is a standard countermeasure against catastrophic forgetting, its impact across diverse capabilities remains underexplored. In this work, we uncover a critical dichotomy in ER's behavior: while it induces positive backward transfer on robust, unstructured tasks (e.g., boosting performance on previous NLP classification tasks through repeated rehearsal), it causes severe negative transfer on fragile, structured domains like code generation (e.g., a significant relative drop in coding accuracy). This reveals that ER trades structural integrity for broad consolidation. To address this dilemma, we propose \textbf{Orthogonal Subspace Wake-up (OSW)}. OSW identifies essential parameter subspaces of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications