Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye, Hao Tang

TL;DR
This survey reviews the development and application of multimodal large language models in medicine, highlighting their capabilities, challenges, and potential in healthcare tasks like diagnosis and treatment based on 330 recent studies.
Contribution
It provides a comprehensive overview of MLLMs in healthcare, summarizing fundamental concepts, applications, data modes, benchmarks, and challenges, offering a valuable resource for future research.
Findings
MLLMs demonstrate strong capabilities in medical reporting, diagnosis, and treatment.
Six data modes and their evaluation benchmarks are identified.
Challenges include data privacy, model interpretability, and domain-specific adaptation.
Abstract
MLLMs have recently become a focal point in the field of artificial intelligence research. Building on the strong capabilities of LLMs, MLLMs are adept at addressing complex multi-modal tasks. With the release of GPT-4, MLLMs have gained substantial attention from different domains. Researchers have begun to explore the potential of MLLMs in the medical and healthcare domain. In this paper, we first introduce the background and fundamental concepts related to LLMs and MLLMs, while emphasizing the working principles of MLLMs. Subsequently, we summarize three main directions of application within healthcare: medical reporting, medical diagnosis, and medical treatment. Our findings are based on a comprehensive review of 330 recent papers in this area. We illustrate the remarkable capabilities of MLLMs in these domains by providing specific examples. For data, we present six mainstream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Softmax · Absolute Position Encodings
