Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?

Corentin Kervadec (LIRIS); Grigory Antipov (Orange); Moez Baccouche; (Orange); Christian Wolf (LIRIS)

arXiv:2006.05121·cs.CV·April 8, 2021

Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?

Corentin Kervadec (LIRIS), Grigory Antipov (Orange), Moez Baccouche, (Orange), Christian Wolf (LIRIS)

PDF

1 Repo

TL;DR

This paper critiques current VQA evaluation methods, introduces the GQA-OOD benchmark to better assess reasoning, and demonstrates that existing models struggle with infrequent concepts, highlighting the need for improved approaches.

Contribution

The paper proposes the GQA-OOD benchmark to evaluate VQA models on rare and frequent questions, emphasizing reasoning over dataset bias exploitation.

Findings

01

Models perform poorly on infrequent concepts.

02

Standard accuracy metrics are misleading for reasoning evaluation.

03

Bias reduction techniques have limited success on rare questions.

Abstract

Models for Visual Question Answering (VQA) are notorious for their tendency to rely on dataset biases, as the large and unbalanced diversity of questions and concepts involved and tends to prevent models from learning to reason, leading them to perform educated guesses instead. In this paper, we claim that the standard evaluation metric, which consists in measuring the overall in-domain accuracy, is misleading. Since questions and concepts are unbalanced, this tends to favor models which exploit subtle training set statistics. Alternatively, naively introducing artificial distribution shifts between train and test splits is also not completely satisfying. First, the shifts do not reflect real-world tendencies, resulting in unsuitable models; second, since the shifts are handcrafted, trained models are specifically designed for this particular setting, and do not generalize to other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gqa-ood/gqa-ood
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.