VisQA: X-raying Vision and Language Reasoning in Transformers

Theo Jaunet; Corentin Kervadec; Romain Vuillemot; Grigory Antipov,; Moez Baccouche; Christian Wolf

arXiv:2104.00926·cs.CV·July 21, 2021

VisQA: X-raying Vision and Language Reasoning in Transformers

Theo Jaunet, Corentin Kervadec, Romain Vuillemot, Grigory Antipov,, Moez Baccouche, Christian Wolf

PDF

1 Repo

TL;DR

VisQA is a visual analytics tool that uses attention maps in transformers to explore and understand reasoning versus bias exploitation in visual question answering models, aiding interpretability and bias detection.

Contribution

It introduces a novel visualization method using attention maps to analyze reasoning processes and bias in VQA models, improving interpretability and training strategies.

Findings

01

Attention maps reveal reasoning steps in models

02

VisQA helps identify bias exploitation in VQA

03

Transfer of reasoning patterns improves model training

Abstract

Visual Question Answering systems target answering open-ended textual questions given input images. They are a testbed for learning high-level reasoning with a primary use in HCI, for instance assistance for the visually impaired. Recent research has shown that state-of-the-art models tend to produce answers exploiting biases and shortcuts in the training data, and sometimes do not even look at the input image, instead of performing the required reasoning steps. We present VisQA, a visual analytics tool that explores this question of reasoning vs. bias exploitation. It exposes the key element of state-of-the-art neural models -- attention maps in transformers. Our working hypothesis is that reasoning steps leading to model predictions are observable from attention distributions, which are particularly useful for visualization. The design process of VisQA was motivated by well-known bias…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Theo-Jaunet/VisQA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.