Empowering Large Language Model for Continual Video Question Answering   with Collaborative Prompting

Chen Cai; Zheng Wang; Jianjun Gao; Wenyang Liu; Ye Lu; Runzhong Zhang,; Kim-Hui Yap

arXiv:2410.00771·cs.CV·January 20, 2025

Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting

Chen Cai, Zheng Wang, Jianjun Gao, Wenyang Liu, Ye Lu, Runzhong Zhang,, Kim-Hui Yap

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Collaborative Prompting (ColPro), a novel method for continual Video Question Answering that mitigates catastrophic forgetting in large language models by using specialized prompts, leading to improved accuracy on benchmark datasets.

Contribution

It proposes a new prompting-based approach, ColPro, to enable continual learning in VideoQA with large language models, addressing the challenge of catastrophic forgetting.

Findings

01

ColPro outperforms existing methods on NExT-QA and DramaQA datasets.

02

Achieves 55.14% accuracy on NExT-QA.

03

Achieves 71.24% accuracy on DramaQA.

Abstract

In recent years, the rapid increase in online video content has underscored the limitations of static Video Question Answering (VideoQA) models trained on fixed datasets, as they struggle to adapt to new questions or tasks posed by newly available content. In this paper, we explore the novel challenge of VideoQA within a continual learning framework, and empirically identify a critical issue: fine-tuning a large language model (LLM) for a sequence of tasks often results in catastrophic forgetting. To address this, we propose Collaborative Prompting (ColPro), which integrates specific question constraint prompting, knowledge acquisition prompting, and visual temporal awareness prompting. These prompts aim to capture textual question context, visual content, and video temporal dynamics in VideoQA, a perspective underexplored in prior research. Experimental results on the NExT-QA and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

caicch/colpro
pytorchOfficial

Videos

Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning