Visual Question Answering: Datasets, Algorithms, and Future Challenges

Kushal Kafle; Christopher Kanan

arXiv:1610.01465·cs.CV·June 16, 2017

Visual Question Answering: Datasets, Algorithms, and Future Challenges

Kushal Kafle, Christopher Kanan

PDF

1 Repo

TL;DR

This paper reviews the evolution of Visual Question Answering (VQA), analyzing datasets, algorithms, and challenges, highlighting limitations of current resources and proposing future research directions in the intersection of computer vision and NLP.

Contribution

It provides a comprehensive critique of existing VQA datasets and algorithms, and discusses future challenges and directions for the field.

Findings

01

Current datasets have limitations in training and evaluation.

02

Many algorithms have been proposed with varying effectiveness.

03

Future research should address dataset limitations and explore new algorithmic approaches.

Abstract

Visual Question Answering (VQA) is a recent problem in computer vision and natural language processing that has garnered a large amount of interest from the deep learning, computer vision, and natural language processing communities. In VQA, an algorithm needs to answer text-based questions about images. Since the release of the first VQA dataset in 2014, additional datasets have been released and many algorithms have been proposed. In this review, we critically examine the current state of VQA in terms of problem formulation, existing datasets, evaluation metrics, and algorithms. In particular, we discuss the limitations of current datasets with regard to their ability to properly train and assess VQA algorithms. We then exhaustively review existing algorithms for VQA. Finally, we discuss possible future directions for VQA and image understanding research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andreiluca96/reversed-vqa
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.