A Survey of Medical Vision-and-Language Applications and Their   Techniques

Qi Chen; Ruoshan Zhao; Sinuo Wang; Vu Minh Hieu Phan; Anton van den; Hengel; Johan Verjans; Zhibin Liao; Minh-Son To; Yong Xia; Jian Chen; Yutong; Xie; Qi Wu

arXiv:2411.12195·cs.CV·November 20, 2024

A Survey of Medical Vision-and-Language Applications and Their Techniques

Qi Chen, Ruoshan Zhao, Sinuo Wang, Vu Minh Hieu Phan, Anton van den, Hengel, Johan Verjans, Zhibin Liao, Minh-Son To, Yong Xia, Jian Chen, Yutong, Xie, Qi Wu

PDF

Open Access 1 Repo

TL;DR

This survey reviews medical vision-and-language models (MVLMs), their architectures, applications, datasets, and evaluation metrics, highlighting challenges and future directions in integrating visual and textual medical data for improved healthcare outcomes.

Contribution

It provides a comprehensive overview and analysis of MVLM architectures, datasets, and applications, offering insights into current challenges and future research trends in medical vision-and-language models.

Findings

01

MVLMs enable automated medical report generation and question answering.

02

Different model architectures employ various strategies for cross-modal integration.

03

Standardized evaluation metrics are used to compare model performance.

Abstract

Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Their applications are versatile and have the potential to improve diagnostic accuracy and decision-making for individual patients while also contributing to enhanced public health monitoring, disease surveillance, and policy-making through more efficient analysis of large data sets. MVLMS integrate natural language processing with medical images to enable a more comprehensive and contextual understanding of medical images alongside their corresponding textual information. Unlike general vision-and-language models trained on diverse, non-specialized datasets, MVLMs are purpose-built for the medical domain, automatically extracting and interpreting critical information from medical images and textual reports to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ytongxie/medical-vision-and-language-tasks-and-methodologies-a-survey
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques