Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models

Xiwen Wei; Mustafa Munir; Radu Marculescu

arXiv:2512.03125·cs.LG·December 4, 2025

Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models

Xiwen Wei, Mustafa Munir, Radu Marculescu

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper addresses catastrophic forgetting in unified multimodal generative models by identifying inter-modal forgetting, providing a theoretical explanation, and proposing a novel architecture called MoDE that decouples modalities to improve continual learning performance.

Contribution

The paper introduces MoDE, a lightweight architecture that decouples modalities in UMGMs to mitigate both intra- and inter-modal forgetting during continual learning.

Findings

01

MoDE significantly reduces inter- and intra-modal forgetting.

02

MoDE outperforms prior continual learning baselines in multimodal tasks.

03

Theoretical analysis explains gradient conflicts causing inter-modal forgetting.

Abstract

Unified Multimodal Generative Models (UMGMs) unify visual understanding and image generation within a single autoregressive framework. However, their ability to continually learn new tasks is severely hindered by catastrophic forgetting, both within a modality (intra-modal) and across modalities (inter-modal). While intra-modal forgetting has been studied in prior continual learning (CL) work, inter-modal forgetting remains largely unexplored. In this paper, we identify and empirically validate this phenomenon in UMGMs and provide a theoretical explanation rooted in gradient conflict between modalities. To address both intra- and inter-modal forgetting, we propose Modality-Decoupled Experts (MoDE), a lightweight and scalable architecture that isolates modality-specific updates to mitigate the gradient conflict and leverages knowledge distillation to prevent catastrophic forgetting and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ChristinaW/MoDE-official
dataset· 41 dl
41 dl

Videos

Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis