Stable Routing for Mixture-of-Experts in Class-Incremental Learning

Zirui Guo; Quan Cheng; Da-Wei Zhou; Lijun Zhang

arXiv:2605.17571·cs.CV·May 19, 2026

Stable Routing for Mixture-of-Experts in Class-Incremental Learning

Zirui Guo, Quan Cheng, Da-Wei Zhou, Lijun Zhang

PDF

TL;DR

This paper introduces StaR-MoE, a routing-level framework that stabilizes expert routing in mixture-of-experts models for class-incremental learning, improving knowledge retention and adaptation.

Contribution

It proposes sensitivity-aware routing alignment and asymmetric capacity regularization to enhance expert stability and capacity utilization in CIL.

Findings

01

StaR-MoE outperforms state-of-the-art methods on four CIL benchmarks.

02

Stable routing significantly improves both average and last accuracy.

03

The framework effectively balances old-class knowledge preservation and new-class learning.

Abstract

Class-incremental learning (CIL) requires models to learn new classes sequentially while preserving prior knowledge. Recently, approaches that combine pre-trained models with mixture-of-experts (MoE) have received increasing attention in CIL: they typically expand experts during learning and employ a router to assign weights across experts. However, existing MoE methods often overlook routing drift induced by expert expansion. Once new experts are introduced, the router may reassign samples from earlier classes to newly added experts, thereby perturbing previously established expert compositions and causing interference even when old experts remain frozen. We argue that expandable MoE in CIL requires two complementary properties: stable old-class routing for knowledge preservation and sufficient capacity utilization for new-class adaptation. To this end, we propose Stable Routing for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.