Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation
Daowan Peng, Wei Wei

TL;DR
This paper introduces KDAR, a knowledge distillation-based method that reduces language priors in VQA models, improving their generalization and achieving state-of-the-art results on OOD benchmarks.
Contribution
The paper proposes KDAR, a novel knowledge distillation approach with adaptive reweighting to mitigate language bias in VQA models, enhancing out-of-distribution performance.
Findings
KDAR outperforms previous methods on VQA-CPv2 OOD benchmark.
The soft labels from a teacher model regularize answer prediction.
Adaptive sample reweighting further improves bias mitigation.
Abstract
Previous studies have pointed out that visual question answering (VQA) models are prone to relying on language priors for answer predictions. In this context, predictions often depend on linguistic shortcuts rather than a comprehensive grasp of multimodal knowledge, which diminishes their generalization ability. In this paper, we propose a novel method, namely, KDAR, leveraging knowledge distillation to address the prior-dependency dilemmas within the VQA task. Specifically, the regularization effect facilitated by soft labels from a well-trained teacher is employed to penalize overfitting to the most common answers. The soft labels, which serve a regularization role, also provide semantic guidance that narrows the range of candidate answers. Additionally, we design an adaptive sample-wise reweighting learning strategy to further mitigate bias by dynamically adjusting the importance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Advanced Image and Video Retrieval Techniques
MethodsKnowledge Distillation
