CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning
Yang Liu, Toan Nguyen, Flora D. Salim

TL;DR
CP-MoE introduces a novel continual learning framework for large models that reduces forgetting and enhances knowledge transfer by using a transient expert and consistency-preserving routing.
Contribution
It proposes a new method, CP-MoE, that effectively balances knowledge transfer and forgetting mitigation in continual learning for large language and vision-language models.
Findings
Achieves state-of-the-art performance on SuperNI benchmark.
Effectively reduces forgetting in multimodal VQA v2 dataset.
Outperforms existing MoE baselines in continual learning scenarios.
Abstract
Catastrophic forgetting remains a major obstacle to continual learning in large language models (LLMs) and vision--language models (VLMs). Although Mixture-of-Experts (MoE) architectures offer an efficient path to scaling, existing LoRA-based MoE continual learning methods still face a fundamental trade-off: they either isolate experts too aggressively, limiting knowledge transfer across tasks, or allow task-specific updates to overwrite important existing parameters, leading to severe forgetting. To address this, we propose CP-MoE, a continual learning framework built around a transient expert that captures early task-specific updates and guides their integration into stable experts. CP-MoE introduces a consistency-preserving routing bias, which uses the transient expert to estimate representation similarity with stable experts and steer routing towards more compatible expert…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
