Eliminating the Language Bias for Visual Question Answering with   fine-grained Causal Intervention

Ying Liu; Ge Bai; Chenji Lu; Shilong Li; Zhang Zhang; Ruifang Liu and; Wenbin Guo

arXiv:2410.10184·cs.CV·October 15, 2024

Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention

Ying Liu, Ge Bai, Chenji Lu, Shilong Li, Zhang Zhang, Ruifang Liu and, Wenbin Guo

PDF

TL;DR

This paper introduces CIBi, a causal intervention method that reduces fine-grained language bias in Visual Question Answering by targeting context and keyword biases, leading to improved model robustness.

Contribution

The paper presents a novel causal intervention training scheme that addresses fine-grained language biases in VQA, utilizing contrastive learning and counterfactual generation for bias elimination.

Findings

01

CIBi effectively reduces language bias in VQA models.

02

The method improves multi-modal representation and model robustness.

03

Experimental results show competitive performance across various models.

Abstract

Despite the remarkable advancements in Visual Question Answering (VQA), the challenge of mitigating the language bias introduced by textual information remains unresolved. Previous approaches capture language bias from a coarse-grained perspective. However, the finer-grained information within a sentence, such as context and keywords, can result in different biases. Due to the ignorance of fine-grained information, most existing methods fail to sufficiently capture language bias. In this paper, we propose a novel causal intervention training scheme named CIBi to eliminate language bias from a finer-grained perspective. Specifically, we divide the language bias into context bias and keyword bias. We employ causal intervention and contrastive learning to eliminate context bias and improve the multi-modal representation. Additionally, we design a new question-only branch based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning