FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning

Xing Han; Shravan Chaudhari; Tanvi Ranade; Rama Chellappa; Suchi Saria

arXiv:2605.09355·cs.LG·May 12, 2026

FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning

Xing Han, Shravan Chaudhari, Tanvi Ranade, Rama Chellappa, Suchi Saria

PDF

TL;DR

This paper introduces FLAME, a scalable mixture-of-experts framework for continual multimodal multi-task learning that supports flexible modality combinations and mitigates catastrophic forgetting.

Contribution

It proposes a novel MoE-based approach with modality-specific routers and low-rank memory compression for efficient multitask pretraining and continual learning.

Findings

01

Achieves competitive multitask pretraining performance.

02

Reduces catastrophic forgetting in continual learning.

03

Improves parameter efficiency for multimodal tasks.

Abstract

Real-world model deployment across multiple domains requires multimodal models to operate under two complementary regimes: (1) multi-task pretraining, tasks are co-available at design time where related tasks could borrow representational strength from one another, (2) continual adaptation, in which new tasks emerge after deployment with previously unseen modality combinations. However, neither regime alone suffices: the pretraining task set is never exhaustive, while bypassing joint training forfeits the transfer gains and efficiency among co-trainable tasks. Sparse Mixture-of-Experts (MoE) is a natural fit for this dual requirement: sparse activation enables modular capacity expansion as new tasks arrive, while routing decouples modality-level computation from task-level composition. In this work, we propose a scalable MoE framework for multitask pretraining and continual learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.