Admitting Ignorance Helps the Video Question Answering Models to Answer

Haopeng Li; Tom Drummond; Mingming Gong; Mohammed Bennamoun; Qiuhong Ke

arXiv:2501.08771·cs.CV·July 3, 2025

Admitting Ignorance Helps the Video Question Answering Models to Answer

Haopeng Li, Tom Drummond, Mingming Gong, Mohammed Bennamoun, Qiuhong Ke

PDF

Open Access

TL;DR

This paper introduces a novel training framework for video question answering that encourages models to admit ignorance when faced with intervened questions, reducing spurious correlations and improving performance.

Contribution

It proposes a new method to mitigate shortcut learning in VideoQA by forcing models to recognize their ignorance through question intervention techniques.

Findings

01

Significant performance improvements on VideoQA benchmarks.

02

Effective reduction of spurious correlations in model predictions.

03

Minimal modifications needed to existing models.

Abstract

Significant progress has been made in the field of video question answering (VideoQA) thanks to deep learning and large-scale pretraining. Despite the presence of sophisticated model structures and powerful video-text foundation models, most existing methods focus solely on maximizing the correlation between answers and video-question pairs during training. We argue that these models often establish shortcuts, resulting in spurious correlations between questions and answers, especially when the alignment between video and text data is suboptimal. To address these spurious correlations, we propose a novel training framework in which the model is compelled to acknowledge its ignorance when presented with an intervened question, rather than making guesses solely based on superficial question-answer correlations. We introduce methodologies for intervening in questions, utilizing techniques…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsFocus