Towards Robust Visual Question Answering: Making the Most of Biased   Samples via Contrastive Learning

Qingyi Si; Yuanxin Liu; Fandong Meng; Zheng Lin; Peng Fu; Yanan Cao,; Weiping Wang; Jie Zhou

arXiv:2210.04563·cs.CV·October 11, 2022·5 cites

Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning

Qingyi Si, Yuanxin Liu, Fandong Meng, Zheng Lin, Peng Fu, Yanan Cao,, Weiping Wang, Jie Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a contrastive learning method called MMBS that leverages biased samples to improve the robustness of VQA models against out-of-distribution data without sacrificing in-distribution performance.

Contribution

The paper proposes a novel contrastive learning approach that exploits biased samples for unbiased reasoning, enhancing VQA model robustness against OOD data while maintaining ID accuracy.

Findings

01

Achieves competitive OOD performance on VQA-CP v2

02

Maintains strong ID performance on VQA v2

03

Compatible with various VQA backbones

Abstract

Models for Visual Question Answering (VQA) often rely on the spurious correlations, i.e., the language priors, that appear in the biased samples of training set, which make them brittle against the out-of-distribution (OOD) test data. Recent methods have achieved promising progress in overcoming this problem by reducing the impact of biased samples on model training. However, these models reveal a trade-off that the improvements on OOD data severely sacrifice the performance on the in-distribution (ID) data (which is dominated by the biased samples). Therefore, we propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples. Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples and explore several strategies to use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

phoebussi/mmbs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsTest · Contrastive Learning