TL;DR
This paper introduces a new VQA setting with changing answer priors, proposes a grounded model to address prior bias, and demonstrates improved robustness and interpretability over existing models.
Contribution
The paper presents VQA-CP datasets with different answer distributions and a novel GVQA model that reduces prior bias, enhancing generalization and interpretability.
Findings
GVQA outperforms SAN on VQA-CP datasets.
GVQA surpasses MCB in several cases.
GVQA maintains strengths on original VQA datasets.
Abstract
A number of studies have found that today's Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. To encourage development of models geared towards the latter, we propose a new setting for VQA where for every question type, train and test sets have different prior distributions of answers. Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively). First, we evaluate several existing VQA models under this new setting and show that their performance degrades significantly compared to the original VQA setting. Second, we propose a novel Grounded Visual Question Answering model (GVQA) that contains inductive biases and restrictions in the architecture specifically designed to prevent the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
