CL-VISTA: Benchmarking Continual Learning in Video Large Language Models
Haiyang Guo, Yichen Shi, Fei Zhu, Wenzhuo Liu, Hongbo Zhao, Fanhu Zeng, Shijie Ma, Da-Han Wang, Xu-Yao Zhang

TL;DR
CL-VISTA introduces a comprehensive benchmark for evaluating continual learning in Video Large Language Models, addressing existing limitations by including diverse tasks and evaluation protocols to better assess model robustness and efficiency.
Contribution
The paper presents CL-VISTA, a new benchmark with diverse tasks and evaluation protocols specifically designed for continual learning in Video-LLMs, highlighting current trade-offs among methods.
Findings
No single continual learning method excels across all evaluated dimensions.
Methods reducing catastrophic forgetting often compromise generalization or increase resource costs.
Benchmarking reveals fundamental trade-offs in current continual learning approaches.
Abstract
Video Large Language Models (Video-LLMs) require continual learning to adapt to non-stationary real-world data. However, existing benchmarks fall short of evaluating modern foundation models: many still rely on models without large-scale pre-training, and prevailing benchmarks typically partition a single dataset into sub-tasks, resulting in high task redundancy and negligible forgetting on pre-trained Video-LLMs. To address these limitations, we propose CL-VISTA, a benchmark tailored for continual video understanding of Video-LLMs. By curating 8 diverse tasks spanning perception, understanding, and reasoning, CL-VISTA induces substantial distribution shifts that effectively expose catastrophic forgetting. To systematically assess CL methods, we establish a comprehensive evaluation framework comprising 6 distinct protocols across 3 critical dimensions: performance, computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
