Bisecle: Binding and Separation in Continual Learning for Video Language Understanding

Yue Tan; Xiaoqian Hu; Hao Xue; Celso De Melo; Flora D. Salim

arXiv:2507.00469·cs.CV·July 2, 2025

Bisecle: Binding and Separation in Continual Learning for Video Language Understanding

Yue Tan, Xiaoqian Hu, Hao Xue, Celso De Melo, Flora D. Salim

PDF

Open Access 1 Video

TL;DR

Bisecle introduces a hippocampus-inspired continual learning method for video-language models, improving memory retention and task adaptation in evolving video understanding tasks with minimal parameter updates.

Contribution

The paper proposes Bisecle, a novel continual learning framework inspired by hippocampal mechanisms, incorporating multi-directional supervision and contrastive prompt learning for efficient video-language model adaptation.

Findings

01

Bisecle reduces catastrophic forgetting in VideoQA tasks.

02

It enhances cross-task generalization in continual learning scenarios.

03

The method demonstrates robustness across multiple benchmarks.

Abstract

Frontier vision-language models (VLMs) have made remarkable improvements in video understanding tasks. However, real-world videos typically exist as continuously evolving data streams (e.g., dynamic scenes captured by wearable glasses), necessitating models to continually adapt to shifting data distributions and novel scenarios. Considering the prohibitive computational costs of fine-tuning models on new tasks, usually, a small subset of parameters is updated while the bulk of the model remains frozen. This poses new challenges to existing continual learning frameworks in the context of large multimodal foundation models, i.e., catastrophic forgetting and update conflict. While the foundation models struggle with parameter-efficient continual learning, the hippocampus in the human brain has evolved highly efficient mechanisms for memory formation and consolidation. Inspired by the rapid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bisecle: Binding and Separation in Continual Learning for Video Language Understanding· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection