Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning
Qingyi Si, Yuanxin Liu, Fandong Meng, Zheng Lin, Peng Fu, Yanan Cao,, Weiping Wang, Jie Zhou

TL;DR
This paper introduces a contrastive learning method called MMBS that leverages biased samples to improve the robustness of VQA models against out-of-distribution data without sacrificing in-distribution performance.
Contribution
The paper proposes a novel contrastive learning approach that exploits biased samples for unbiased reasoning, enhancing VQA model robustness against OOD data while maintaining ID accuracy.
Findings
Achieves competitive OOD performance on VQA-CP v2
Maintains strong ID performance on VQA v2
Compatible with various VQA backbones
Abstract
Models for Visual Question Answering (VQA) often rely on the spurious correlations, i.e., the language priors, that appear in the biased samples of training set, which make them brittle against the out-of-distribution (OOD) test data. Recent methods have achieved promising progress in overcoming this problem by reducing the impact of biased samples on model training. However, these models reveal a trade-off that the improvements on OOD data severely sacrifice the performance on the in-distribution (ID) data (which is dominated by the biased samples). Therefore, we propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples. Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples and explore several strategies to use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsTest · Contrastive Learning
