Medical Visual Question Answering: A Survey

Zhihong Lin; Donghao Zhang; Qingyi Tao; Danli Shi; Gholamreza Haffari,; Qi Wu; Mingguang He; and Zongyuan Ge

arXiv:2111.10056·cs.CV·June 12, 2023·Artif. Intell. Medicine·6 cites

Medical Visual Question Answering: A Survey

Zhihong Lin, Donghao Zhang, Qingyi Tao, Danli Shi, Gholamreza Haffari,, Qi Wu, Mingguang He, and Zongyuan Ge

PDF

Open Access

TL;DR

This survey reviews the current state of medical visual question answering, covering datasets, methods, challenges, and future directions to guide researchers in advancing this specialized AI field.

Contribution

It provides a comprehensive overview of medical VQA datasets, approaches, challenges, and future research directions, filling a gap in focused survey literature.

Findings

01

Summarizes publicly available medical VQA datasets.

02

Analyzes techniques and innovations in medical VQA methods.

03

Discusses medical-specific challenges and future research directions.

Abstract

Medical Visual Question Answering~(VQA) is a combination of medical artificial intelligence and popular VQA challenges. Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer. Although the general-domain VQA has been extensively studied, the medical VQA still needs specific investigation and exploration due to its task features. In the first part of this survey, we collect and discuss the publicly available medical VQA datasets up-to-date about the data source, data quantity, and task feature. In the second part, we review the approaches used in medical VQA tasks. We summarize and discuss their techniques, innovations, and potential improvements. In the last part, we analyze some medical-specific challenges for the field and discuss future research directions. Our goal is to provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques