Beyond Retention: Orchestrating Structural Safety and Plasticity in Continual Learning for LLMs
Fei Meng

TL;DR
This paper investigates the limitations of Experience Replay in continual learning for LLMs, revealing a trade-off between stability and plasticity, and proposes a novel method, OSW, to better preserve fragile knowledge while learning new tasks.
Contribution
The paper introduces Orthogonal Subspace Wake-up (OSW), a new approach that enforces orthogonal updates to preserve previous task knowledge, addressing ER's limitations in fragile domains.
Findings
ER improves performance on unstructured tasks but harms structured tasks like code generation.
OSW effectively preserves coding abilities while maintaining plasticity for new tasks.
Empirical results show OSW outperforms ER in diverse continual learning scenarios.
Abstract
Continual learning in Large Language Models (LLMs) faces the critical challenge of balancing stability (retaining old knowledge) and plasticity (learning new tasks). While Experience Replay (ER) is a standard countermeasure against catastrophic forgetting, its impact across diverse capabilities remains underexplored. In this work, we uncover a critical dichotomy in ER's behavior: while it induces positive backward transfer on robust, unstructured tasks (e.g., boosting performance on previous NLP classification tasks through repeated rehearsal), it causes severe negative transfer on fragile, structured domains like code generation (e.g., a significant relative drop in coding accuracy). This reveals that ER trades structural integrity for broad consolidation. To address this dilemma, we propose \textbf{Orthogonal Subspace Wake-up (OSW)}. OSW identifies essential parameter subspaces of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications
