Visual Question Answering: A Survey of Methods and Datasets

Qi Wu; Damien Teney; Peng Wang; Chunhua Shen; Anthony Dick; Anton van; den Hengel

arXiv:1607.05910·cs.CV·July 21, 2016·44 cites

Visual Question Answering: A Survey of Methods and Datasets

Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, Anton van, den Hengel

PDF

Open Access 1 Repo

TL;DR

This survey reviews current methods and datasets in Visual Question Answering, highlighting approaches that combine visual and textual reasoning, and discusses future directions involving knowledge bases and NLP models.

Contribution

It provides a comprehensive classification of VQA methods and an in-depth review of datasets, emphasizing the integration of structured knowledge and reasoning capabilities.

Findings

01

Comparison of neural network-based approaches

02

Analysis of datasets including Visual Genome

03

Discussion on future research directions

Abstract

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires reasoning over visual elements of the image and general knowledge to infer the correct answer. In the first part of this survey, we examine the state of the art by comparing modern approaches to the problem. We classify methods by their mechanism to connect the visual and textual modalities. In particular, we examine the common approach of combining convolutional and recurrent neural networks to map images and questions to a common feature space. We also discuss memory-augmented and modular architectures that interface with structured knowledge bases. In the second part of this survey, we review the datasets available for training and evaluating VQA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AI-metrics/AI-metrics
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning