On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models

Chongyang Zhao; Mingsong Li; Haodong Lu; Dong Gong

arXiv:2603.27481·cs.LG·March 31, 2026

On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models

Chongyang Zhao, Mingsong Li, Haodong Lu, Dong Gong

PDF

1 Repo

TL;DR

This paper introduces LLaVA-DyMoE, a dynamic mixture of experts framework with drift-aware token assignment to improve continual learning in large vision language models, reducing forgetting and enhancing accuracy.

Contribution

It proposes a novel token-level assignment guidance and regularization techniques to mitigate routing-drift in MoE-based continual learning, improving model performance.

Findings

01

Achieves over 7% gain in mean final accuracy.

02

Reduces forgetting by 12% compared to baselines.

03

Effectively mitigates routing-drift-induced forgetting.

Abstract

Multimodal Continual Instruction Tuning aims to continually enhance Large Vision Language Models (LVLMs) by learning from new data without forgetting previously acquired knowledge. Mixture of Experts (MoE) architectures naturally facilitate this by incrementally adding new experts and expanding routers while keeping the existing ones frozen. However, despite expert isolation, MoE-based continual learners still suffer from forgetting due to routing-drift: old-task tokens become mistakenly attracted to newly added experts, degrading performance on prior tasks. We analyze the failure mode at the token level and reveal the token's dilemma: ambiguous and old tokens in new-task data offer minimal learning benefit yet induce forgetting when routed to new experts, due to their ambiguous routing assignment during training. Motivated by this, we propose LLaVA-DyMoE, a dynamic MoE framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://zhaoc5.github.io/DyMoE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.