Causal Debiasing for Visual Commonsense Reasoning
Jiayi Zou, Gengyun Jia, Bing-Kun Bao

TL;DR
This paper identifies biases in Visual Commonsense Reasoning datasets and proposes a causal debiasing method using backdoor adjustment and a dictionary-based approach to improve model generalization.
Contribution
It introduces VCR-OOD datasets for evaluating cross-modal generalization and applies causal inference techniques to effectively reduce dataset biases.
Findings
Debiasing improves model generalization across datasets
VCR-OOD datasets reveal existing biases in VCR models
Causal methods outperform baseline debiasing approaches
Abstract
Visual Commonsense Reasoning (VCR) refers to answering questions and providing explanations based on images. While existing methods achieve high prediction accuracy, they often overlook bias in datasets and lack debiasing strategies. In this paper, our analysis reveals co-occurrence and statistical biases in both textual and visual data. We introduce the VCR-OOD datasets, comprising VCR-OOD-QA and VCR-OOD-VA subsets, which are designed to evaluate the generalization capabilities of models across two modalities. Furthermore, we analyze the causal graphs and prediction shortcuts in VCR and adopt a backdoor adjustment method to remove bias. Specifically, we create a dictionary based on the set of correct answers to eliminate prediction shortcuts. Experiments demonstrate the effectiveness of our debiasing method across different datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
