Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning

Connor Mclaughlin; Nigel Lee; Lili Su

arXiv:2603.23436·cs.LG·March 25, 2026

Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning

Connor Mclaughlin, Nigel Lee, Lili Su

PDF

Open Access

TL;DR

This paper introduces a similarity-aware mixture-of-experts framework for data-efficient continual learning, effectively handling limited data and overlapping tasks by leveraging task similarity and preventing negative transfer.

Contribution

It proposes an adaptive MoE approach with incremental global pooling and instance-wise prompt masking to improve knowledge transfer and task differentiation in continual learning.

Findings

01

Enhances sample efficiency across various data volumes.

02

Effectively manages task overlap and prevents negative transfer.

03

Broad applicability demonstrated in experiments.

Abstract

Machine learning models often need to adapt to new data after deployment due to structured or unstructured real-world dynamics. The Continual Learning (CL) framework enables continuous model adaptation, but most existing approaches either assume each task contains sufficiently many data samples or that the learning tasks are non-overlapping. In this paper, we address the more general setting where each task may have a limited dataset, and tasks may overlap in an arbitrary manner without a priori knowledge. This general setting is substantially more challenging for two reasons. On the one hand, data scarcity necessitates effective contextualization of general knowledge and efficient knowledge transfer across tasks. On the other hand, unstructured task overlapping can easily result in negative knowledge transfer. To address the above challenges, we propose an adaptive mixture-of-experts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Face recognition and analysis