Recent, rapid advancement in visual question answering architecture: a   review

Venkat Kodali; Daniel Berleant

arXiv:2203.01322·cs.CV·July 12, 2022·1 cites

Recent, rapid advancement in visual question answering architecture: a review

Venkat Kodali, Daniel Berleant

PDF

Open Access

TL;DR

This review paper summarizes recent rapid advancements in visual question answering architectures, emphasizing the growth of multimodal systems and their significance in AI over the past few years.

Contribution

It provides an updated overview of recent developments in visual question answering architectures, building on previous reviews and highlighting new trends and improvements.

Findings

01

Significant growth in research on multimodal architectures

02

Recent improvements in VQA system accuracy and efficiency

03

Increased importance of VQA in AI applications

Abstract

Understanding visual question answering is going to be crucial for numerous human activities. However, it presents major challenges at the heart of the artificial intelligence endeavor. This paper presents an update on the rapid advancements in visual question answering using images that have occurred in the last couple of years. Tremendous growth in research on improving visual question answering system architecture has been published recently, showing the importance of multimodal architectures. Several points on the benefits of visual question answering are mentioned in the review paper by Manmadhan et al. (2020), on which the present article builds, including subsequent updates in the field.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning