Separation and Collaboration: Two-Level Routing Grouped Mixture-of-Experts for Multi-Domain Continual Learning

Jialu Zhou; Dianxi Shi; Shaowu Yang; Xinyu Wei; Mingyue Yang; Leqian Li; Mengzhu Wang; Chunping Qiu

arXiv:2508.07738·cs.LG·August 12, 2025

Separation and Collaboration: Two-Level Routing Grouped Mixture-of-Experts for Multi-Domain Continual Learning

Jialu Zhou, Dianxi Shi, Shaowu Yang, Xinyu Wei, Mingyue Yang, Leqian Li, Mengzhu Wang, Chunping Qiu

PDF

Open Access

TL;DR

This paper introduces TRGE, a novel two-level routing mixture-of-experts approach for multi-domain continual learning that dynamically expands models, mitigates forgetting, and enhances task collaboration using multimodal models.

Contribution

The paper proposes a dynamic two-level routing mixture-of-experts framework with intra- and inter-group routing, and leverages multimodal large language models for task identification, advancing multi-domain continual learning.

Findings

01

Outperforms existing methods in various settings.

02

Reduces catastrophic and forward forgetting effectively.

03

Uses fewer trainable parameters than comparable approaches.

Abstract

Multi-Domain Continual Learning (MDCL) acquires knowledge from sequential tasks with shifting class sets and distribution. Despite the Parameter-Efficient Fine-Tuning (PEFT) methods can adapt for this dual heterogeneity, they still suffer from catastrophic forgetting and forward forgetting. To address these challenges, we propose a Two-Level Routing Grouped Mixture-of-Experts (TRGE) method. Firstly, TRGE dynamically expands the pre-trained CLIP model, assigning specific expert group for each task to mitigate catastrophic forgetting. With the number of experts continually grows in this process, TRGE maintains the static experts count within the group and introduces the intra-group router to alleviate routing overfitting caused by the increasing routing complexity. Meanwhile, we design an inter-group routing policy based on task identifiers and task prototype distance, which dynamically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications