TL;DR
This comprehensive survey reviews recent progress in multimodal continual learning, highlighting challenges like multimodal forgetting and modality imbalance, and categorizing existing methods into four main types.
Contribution
First extensive survey on MMCL, providing taxonomy, background, datasets, benchmarks, and future directions, with a GitHub resource for related papers and tools.
Findings
Categorized MMCL methods into four groups: regularization, architecture, replay, prompt.
Identified key challenges such as multimodal catastrophic forgetting and modality imbalance.
Provided open datasets and benchmarks for future research.
Abstract
Continual learning (CL) aims to empower machine learning models to learn continually from new data, while building upon previously acquired knowledge without forgetting. As models have evolved from small to large pre-trained architectures, and from supporting unimodal to multimodal data, multimodal continual learning (MMCL) methods have recently emerged. The primary complexity of MMCL is that it extends beyond a simple stacking of unimodal CL methods. Such straightforward approaches often suffer from multimodal catastrophic forgetting, yielding unsatisfactory performance. In addition, MMCL introduces new challenges that unimodal CL methods fail to adequately address, including modality imbalance, complex modality interaction, high computational costs, and degradation of pre-trained zero-shot capability of multimodal backbones. In this work, we present the first comprehensive survey on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
