CL-VISTA: Benchmarking Continual Learning in Video Large Language Models

Haiyang Guo; Yichen Shi; Fei Zhu; Wenzhuo Liu; Hongbo Zhao; Fanhu Zeng; Shijie Ma; Da-Han Wang; Xu-Yao Zhang

arXiv:2604.00677·cs.CV·April 2, 2026

CL-VISTA: Benchmarking Continual Learning in Video Large Language Models

Haiyang Guo, Yichen Shi, Fei Zhu, Wenzhuo Liu, Hongbo Zhao, Fanhu Zeng, Shijie Ma, Da-Han Wang, Xu-Yao Zhang

PDF

1 Datasets

TL;DR

CL-VISTA introduces a comprehensive benchmark for evaluating continual learning in Video Large Language Models, addressing existing limitations by including diverse tasks and evaluation protocols to better assess model robustness and efficiency.

Contribution

The paper presents CL-VISTA, a new benchmark with diverse tasks and evaluation protocols specifically designed for continual learning in Video-LLMs, highlighting current trade-offs among methods.

Findings

01

No single continual learning method excels across all evaluated dimensions.

02

Methods reducing catastrophic forgetting often compromise generalization or increase resource costs.

03

Benchmarking reveals fundamental trade-offs in current continual learning approaches.

Abstract

Video Large Language Models (Video-LLMs) require continual learning to adapt to non-stationary real-world data. However, existing benchmarks fall short of evaluating modern foundation models: many still rely on models without large-scale pre-training, and prevailing benchmarks typically partition a single dataset into sub-tasks, resulting in high task redundancy and negligible forgetting on pre-trained Video-LLMs. To address these limitations, we propose CL-VISTA, a benchmark tailored for continual video understanding of Video-LLMs. By curating 8 diverse tasks spanning perception, understanding, and reasoning, CL-VISTA induces substantial distribution shifts that effectively expose catastrophic forgetting. To systematically assess CL methods, we establish a comprehensive evaluation framework comprising 6 distinct protocols across 3 critical dimensions: performance, computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

MLLM-CL/CL-VISTA
dataset· 706 dl
706 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.