Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting

Yuyang Liu; Qiuhe Hong; Linlan Huang; Alexandra Gomez-Villa; Dipam Goswami; Xialei Liu; Joost van de Weijer; Yonghong Tian

arXiv:2508.04227·cs.CV·May 19, 2026

Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting

Yuyang Liu, Qiuhe Hong, Linlan Huang, Alexandra Gomez-Villa, Dipam Goswami, Xialei Liu, Joost van de Weijer, Yonghong Tian

PDF

1 Repo

TL;DR

This survey comprehensively reviews continual learning challenges and solutions for vision-language models and multimodal large language models, emphasizing unique issues like cross-modal feature drift and zero-shot capability erosion.

Contribution

It introduces a challenge-driven taxonomy for continual learning in VLMs and MLLMs, analyzing failure modes and proposing future research directions.

Findings

01

Deconstructed failure modes of VLMs and MLLMs in continual learning.

02

Proposed a four-paradigm taxonomy for addressing continual learning challenges.

03

Highlighted the importance of dual-track benchmarks and micro-diagnostic evaluations.

Abstract

Vision-language models (VLMs) and the recent surge of Multimodal Large Language Models (MLLMs) have revolutionized artificial intelligence with unprecedented cross-modal alignment and zero-shot generalization. However, enabling them to learn continually from non-stationary data remains a major challenge, as their cross-modal alignment and generalization capabilities are particularly vulnerable to catastrophic forgetting. Unlike traditional unimodal continual learning (CL), VLMs face unique challenges such as cross-modal feature drift, parameter interference due to shared architectures, and zero-shot capability erosion. Furthermore, generative MLLMs exhibit a unique ``alignment tax,'' where catastrophic forgetting manifests not merely as factual amnesia, but as a systemic collapse of deep Chain-of-Thought (CoT) reasoning. This survey presents the first comprehensive, diagnostic review…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YuyangSunshine/Awesome-Continual-learning-of-Vision-Language-Models
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.