Bisecle: Binding and Separation in Continual Learning for Video Language Understanding
Yue Tan, Xiaoqian Hu, Hao Xue, Celso De Melo, Flora D. Salim

TL;DR
Bisecle introduces a hippocampus-inspired continual learning method for video-language models, improving memory retention and task adaptation in evolving video understanding tasks with minimal parameter updates.
Contribution
The paper proposes Bisecle, a novel continual learning framework inspired by hippocampal mechanisms, incorporating multi-directional supervision and contrastive prompt learning for efficient video-language model adaptation.
Findings
Bisecle reduces catastrophic forgetting in VideoQA tasks.
It enhances cross-task generalization in continual learning scenarios.
The method demonstrates robustness across multiple benchmarks.
Abstract
Frontier vision-language models (VLMs) have made remarkable improvements in video understanding tasks. However, real-world videos typically exist as continuously evolving data streams (e.g., dynamic scenes captured by wearable glasses), necessitating models to continually adapt to shifting data distributions and novel scenarios. Considering the prohibitive computational costs of fine-tuning models on new tasks, usually, a small subset of parameters is updated while the bulk of the model remains frozen. This poses new challenges to existing continual learning frameworks in the context of large multimodal foundation models, i.e., catastrophic forgetting and update conflict. While the foundation models struggle with parameter-efficient continual learning, the hippocampus in the human brain has evolved highly efficient mechanisms for memory formation and consolidation. Inspired by the rapid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection
