Admitting Ignorance Helps the Video Question Answering Models to Answer
Haopeng Li, Tom Drummond, Mingming Gong, Mohammed Bennamoun, Qiuhong Ke

TL;DR
This paper introduces a novel training framework for video question answering that encourages models to admit ignorance when faced with intervened questions, reducing spurious correlations and improving performance.
Contribution
It proposes a new method to mitigate shortcut learning in VideoQA by forcing models to recognize their ignorance through question intervention techniques.
Findings
Significant performance improvements on VideoQA benchmarks.
Effective reduction of spurious correlations in model predictions.
Minimal modifications needed to existing models.
Abstract
Significant progress has been made in the field of video question answering (VideoQA) thanks to deep learning and large-scale pretraining. Despite the presence of sophisticated model structures and powerful video-text foundation models, most existing methods focus solely on maximizing the correlation between answers and video-question pairs during training. We argue that these models often establish shortcuts, resulting in spurious correlations between questions and answers, especially when the alignment between video and text data is suboptimal. To address these spurious correlations, we propose a novel training framework in which the model is compelled to acknowledge its ignorance when presented with an intervened question, rather than making guesses solely based on superficial question-answer correlations. We introduce methodologies for intervening in questions, utilizing techniques…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
MethodsFocus
