Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in   Visual Question Answering

Corentin Dancette; Remi Cadene; Damien Teney; Matthieu Cord

arXiv:2104.03149·cs.CV·September 2, 2021

Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering

Corentin Dancette, Remi Cadene, Damien Teney, Matthieu Cord

PDF

1 Repo

TL;DR

This paper develops a new evaluation method for visual question answering (VQA) that detects multimodal shortcut learning involving both questions and images, revealing that current models are often biased and perform poorly on these challenges.

Contribution

It introduces VQA-CounterExamples, an evaluation protocol for identifying multimodal shortcuts in VQA datasets, and demonstrates the ineffectiveness of existing bias mitigation techniques.

Findings

01

State-of-the-art models perform poorly on multimodal shortcut detection.

02

Existing bias reduction methods are largely ineffective against multimodal shortcuts.

03

Past focus on question-based biases overlooks the complexity of multimodal biases.

Abstract

We introduce an evaluation methodology for visual question answering (VQA) to better diagnose cases of shortcut learning. These cases happen when a model exploits spurious statistical regularities to produce correct answers but does not actually deploy the desired behavior. There is a need to identify possible shortcuts in a dataset and assess their use before deploying a model in the real world. The research community in VQA has focused exclusively on question-based shortcuts, where a model might, for example, answer "What is the color of the sky" with "blue" by relying mostly on the question-conditional training prior and give little weight to visual evidence. We go a step further and consider multimodal shortcuts that involve both questions and images. We first identify potential shortcuts in the popular VQA v2 training set by mining trivial predictive rules such as co-occurrences of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cdancette/detect-shortcuts
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.