Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models
Xiwen Wei, Mustafa Munir, Radu Marculescu

TL;DR
This paper addresses catastrophic forgetting in unified multimodal generative models by identifying inter-modal forgetting, providing a theoretical explanation, and proposing a novel architecture called MoDE that decouples modalities to improve continual learning performance.
Contribution
The paper introduces MoDE, a lightweight architecture that decouples modalities in UMGMs to mitigate both intra- and inter-modal forgetting during continual learning.
Findings
MoDE significantly reduces inter- and intra-modal forgetting.
MoDE outperforms prior continual learning baselines in multimodal tasks.
Theoretical analysis explains gradient conflicts causing inter-modal forgetting.
Abstract
Unified Multimodal Generative Models (UMGMs) unify visual understanding and image generation within a single autoregressive framework. However, their ability to continually learn new tasks is severely hindered by catastrophic forgetting, both within a modality (intra-modal) and across modalities (inter-modal). While intra-modal forgetting has been studied in prior continual learning (CL) work, inter-modal forgetting remains largely unexplored. In this paper, we identify and empirically validate this phenomenon in UMGMs and provide a theoretical explanation rooted in gradient conflict between modalities. To address both intra- and inter-modal forgetting, we propose Modality-Decoupled Experts (MoDE), a lightweight and scalable architecture that isolates modality-specific updates to mitigate the gradient conflict and leverages knowledge distillation to prevent catastrophic forgetting and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
