Can you even tell left from right? Presenting a new challenge for VQA

Sai Raam Venkatraman; Rishi Rao; S. Balasubramanian; Chandra Sekhar; Vorugunti; R. Raghunatha Sarma

arXiv:2203.07664·cs.CV·March 16, 2022

Can you even tell left from right? Presenting a new challenge for VQA

Sai Raam Venkatraman, Rishi Rao, S. Balasubramanian, Chandra Sekhar, Vorugunti, R. Raghunatha Sarma

PDF

Open Access 1 Video

TL;DR

This paper introduces UOUC, a synthetic VQA dataset designed to evaluate and improve models' compositional generalisation, reasoning, and memorisation abilities, addressing limitations of existing datasets.

Contribution

The creation of UOUC, a large, well-separated synthetic dataset with diverse questions to challenge and evaluate VQA models' compositional and reasoning skills.

Findings

01

Current VQA models show poor compositional generalisation.

02

Models perform relatively worse on simple reasoning tasks.

03

UOUC is a strong benchmark for future VQA research.

Abstract

Visual Question Answering (VQA) needs a means of evaluating the strengths and weaknesses of models. One aspect of such an evaluation is the evaluation of compositional generalisation, or the ability of a model to answer well on scenes whose scene-setups are different from the training set. Therefore, for this purpose, we need datasets whose train and test sets differ significantly in composition. In this work, we present several quantitative measures of compositional separation and find that popular datasets for VQA are not good evaluators. To solve this, we present Uncommon Objects in Unseen Configurations (UOUC), a synthetic dataset for VQA. UOUC is at once fairly complex while also being well-separated, compositionally. The object-class of UOUC consists of 380 clasess taken from 528 characters from the Dungeons and Dragons game. The train set of UOUC consists of 200,000 scenes;…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Can You Even Tell Left From Right? Presenting a New Challenge for VQA· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning