Don't Just Assume; Look and Answer: Overcoming Priors for Visual   Question Answering

Aishwarya Agrawal; Dhruv Batra; Devi Parikh; Aniruddha Kembhavi

arXiv:1712.00377·cs.CV·June 5, 2018

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi

PDF

1 Repo

TL;DR

This paper introduces a new VQA setting with changing answer priors, proposes a grounded model to address prior bias, and demonstrates improved robustness and interpretability over existing models.

Contribution

The paper presents VQA-CP datasets with different answer distributions and a novel GVQA model that reduces prior bias, enhancing generalization and interpretability.

Findings

01

GVQA outperforms SAN on VQA-CP datasets.

02

GVQA surpasses MCB in several cases.

03

GVQA maintains strengths on original VQA datasets.

Abstract

A number of studies have found that today's Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To encourage development of models geared towards the latter, we propose a new setting for VQA where for every question type, train and test sets have different prior distributions of answers. Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively). First, we evaluate several existing VQA models under this new setting and show that their performance degrades significantly compared to the original VQA setting. Second, we propose a novel Grounded Visual Question Answering model (GVQA) that contains inductive biases and restrictions in the architecture specifically designed to prevent the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AishwaryaAgrawal/GVQA
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.