Visual Question Answering: A Survey on Techniques and Common Trends in   Recent Literature

Ana Cl\'audia Akemi Matsuki de Faria; Felype de Castro Bastos; Jos\'e; Victor Nogueira Alves da Silva; Vitor Lopes Fabris; Valeska de Sousa Uchoa,; D\'ecio Gon\c{c}alves de Aguiar Neto; Claudio Filipi Goncalves dos Santos

arXiv:2305.11033·cs.CV·June 5, 2023·6 cites

Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature

Ana Cl\'audia Akemi Matsuki de Faria, Felype de Castro Bastos, Jos\'e, Victor Nogueira Alves da Silva, Vitor Lopes Fabris, Valeska de Sousa Uchoa,, D\'ecio Gon\c{c}alves de Aguiar Neto, Claudio Filipi Goncalves dos Santos

PDF

Open Access

TL;DR

This survey reviews recent advances in Visual Question Answering (VQA), analyzing 25 studies and 6 datasets to identify trends, challenges, and future directions in this emerging interdisciplinary field.

Contribution

It provides a comprehensive analysis and comparison of recent VQA research, highlighting common errors, state-of-the-art results, and potential areas for improvement.

Findings

01

Identified key datasets used in VQA research.

02

Summarized common challenges and errors in current methods.

03

Outlined future research directions and potential improvements.

Abstract

Visual Question Answering (VQA) is an emerging area of interest for researches, being a recent problem in natural language processing and image prediction. In this area, an algorithm needs to answer questions about certain images. As of the writing of this survey, 25 recent studies were analyzed. Besides, 6 datasets were analyzed and provided their link to download. In this work, several recent pieces of research in this area were investigated and a deeper analysis and comparison among them were provided, including results, the state-of-the-art, common errors, and possible points of improvement for future researchers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning