Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation

Evelyn Chee; Wynne Hsu; Mong Li Lee

arXiv:2511.06723·cs.LG·November 11, 2025

Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation

Evelyn Chee, Wynne Hsu, Mong Li Lee

PDF

Open Access

TL;DR

This paper introduces a novel multi-modal continual learning framework that leverages cross-modality adapters and representation alignment to effectively integrate diverse sensory data while preventing catastrophic forgetting.

Contribution

It proposes a pre-trained model-based approach with a mixture-of-experts adapter and a new representation alignment loss for multi-modal continual learning.

Findings

01

Outperforms baselines in class-incremental learning

02

Achieves higher accuracy on multi-modal datasets

03

Reduces catastrophic forgetting effectively

Abstract

Continual learning is essential for adapting models to new tasks while retaining previously acquired knowledge. While existing approaches predominantly focus on uni-modal data, multi-modal learning offers substantial benefits by utilizing diverse sensory inputs, akin to human perception. However, multi-modal continual learning presents additional challenges, as the model must effectively integrate new information from various modalities while preventing catastrophic forgetting. In this work, we propose a pre-trained model-based framework for multi-modal continual learning. Our framework includes a novel cross-modality adapter with a mixture-of-experts structure to facilitate effective integration of multi-modal information across tasks. We also introduce a representation alignment loss that fosters learning of robust multi-modal representations, and regularize relationships between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection