Unveiling Cross Modality Bias in Visual Question Answering: A Causal   View with Possible Worlds VQA

Ali Vosoughi; Shijian Deng; Songyang Zhang; Yapeng Tian; Chenliang Xu,; Jiebo Luo

arXiv:2305.19664·cs.CV·June 1, 2023·2 cites

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA

Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu,, Jiebo Luo

PDF

Open Access

TL;DR

This paper introduces a causal inference approach to simultaneously reduce vision and language biases in Visual Question Answering (VQA), improving generalization and accuracy by addressing confounding effects.

Contribution

It models the confounding effects of vision and language biases in VQA and proposes a novel counterfactual inference method to mitigate these biases concurrently.

Findings

01

Outperforms state-of-the-art on VQA-CP v2 dataset

02

Effectively reduces both vision and language biases

03

Improves question-answering accuracy with numerical answers

Abstract

To increase the generalization capability of VQA systems, many recent studies have tried to de-bias spurious language or vision associations that shortcut the question or image to the answer. Despite these efforts, the literature fails to address the confounding effect of vision and language simultaneously. As a result, when they reduce bias learned from one modality, they usually increase bias from the other. In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect. The model trained in this strategy can concurrently and efficiently reduce vision and language bias. To the best of our knowledge, this is the first work to reduce biases resulting from confounding effects of vision and language in VQA, leveraging causal explain-away relations. We accompany our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling