How Transferable are Reasoning Patterns in VQA?

Corentin Kervadec; Theo Jaunet; Grigory Antipov; Moez Baccouche,; Romain Vuillemot; Christian Wolf

arXiv:2104.03656·cs.CV·April 9, 2021

How Transferable are Reasoning Patterns in VQA?

Corentin Kervadec, Theo Jaunet, Grigory Antipov, Moez Baccouche,, Romain Vuillemot, Christian Wolf

PDF

TL;DR

This paper investigates the transferability of reasoning patterns in VQA, demonstrating that transferring insights from a less bias-prone visual oracle improves model accuracy and reduces bias reliance.

Contribution

It introduces a visual oracle for studying reasoning in VQA, analyzes attention mechanisms, and shows that transferring reasoning patterns enhances model performance and generalization.

Findings

01

Transferred reasoning patterns improve overall accuracy.

02

Model shows increased accuracy on infrequent answers.

03

Reduced dependency on dataset biases.

Abstract

Since its inception, Visual Question Answering (VQA) is notoriously known as a task, where models are prone to exploit biases in datasets to find shortcuts instead of performing high-level reasoning. Classical methods address this by removing biases from training data, or adding branches to models to detect and remove biases. In this paper, we argue that uncertainty in vision is a dominating factor preventing the successful learning of reasoning in vision and language problems. We train a visual oracle and in a large scale study provide experimental evidence that it is much less prone to exploiting spurious dataset biases compared to standard models. We propose to study the attention mechanisms at work in the visual oracle and compare them with a SOTA Transformer-based model. We provide an in-depth analysis and visualizations of reasoning patterns obtained with an online visualization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.