Visual Question Answering: which investigated applications?

Silvio Barra; Carmen Bisogni; Maria De Marsico; Stefano Ricciardi

arXiv:2103.02937·cs.CV·March 9, 2021

Visual Question Answering: which investigated applications?

Silvio Barra, Carmen Bisogni, Maria De Marsico, Stefano Ricciardi

PDF

1 Repo

TL;DR

This paper reviews the state of Visual Question Answering (VQA), emphasizing real-world applications and domain-specific datasets, and discusses recent challenges in the field.

Contribution

It shifts focus from general datasets to application-oriented proposals and benchmarks, highlighting recent challenges in VQA research.

Findings

01

Most VQA works rely on general-purpose datasets.

02

Application-specific datasets are crucial for real-world VQA.

03

Recent challenges include dataset bias and multimodal reasoning difficulties.

Abstract

Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Computer Vision (CV) and Natural Language Processig (NLP) have recently met. In image captioning and video summarization, the semantic information is completely contained in still images or video dynamics, and it has only to be mined and expressed in a human-consistent way. Differently from this, in VQA semantic information in the same media must be compared with the semantics implied by a question expressed in natural language, doubling the artificial intelligence-related effort. Some recent surveys about VQA approaches have focused on methods underlying either the image-related processing or the verbal-related one, or on the way to consistently fuse the conveyed information. Possible applications are only suggested, and, in fact, most cited works rely on general-purpose datasets that are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gangasani-anusha/Short-Story_Assignment
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.