A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

Dustin Schwenk; Apoorv Khandelwal; Christopher Clark; Kenneth Marino,; Roozbeh Mottaghi

arXiv:2206.01718·cs.CV·June 6, 2022·1 cites

A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, Kenneth Marino,, Roozbeh Mottaghi

PDF

Open Access 1 Repo 1 Models 5 Datasets

TL;DR

A-OKVQA is a new challenging dataset for visual question answering that emphasizes questions requiring broad commonsense and world knowledge, aiming to advance AI reasoning capabilities beyond simple fact retrieval.

Contribution

The paper introduces A-OKVQA, a diverse dataset of 25K questions that demand complex reasoning and world knowledge, addressing limitations of previous VQA datasets.

Findings

01

State-of-the-art models perform poorly on A-OKVQA

02

Questions require reasoning beyond simple image queries

03

Dataset promotes development of more intelligent VQA systems

Abstract

The Visual Question Answering (VQA) task aspires to provide a meaningful testbed for the development of AI models that can jointly reason over visual and natural language inputs. Despite a proliferation of VQA datasets, this goal is hindered by a set of common limitations. These include a reliance on relatively simplistic questions that are repetitive in both concepts and linguistic structure, little world knowledge needed outside of the paired image, and limited reasoning required to arrive at the correct answer. We introduce A-OKVQA, a crowdsourced dataset composed of a diverse set of about 25K questions requiring a broad base of commonsense and world knowledge to answer. In contrast to the existing knowledge-based VQA datasets, the questions generally cannot be answered by simply querying a knowledge base, and instead require some form of commonsense reasoning about the scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/aokvqa
pytorchOfficial

Models

🤗
tuandunghcmut/vlmeval
model

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsBalanced Selection