Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports
Yutong Zhang, Yi Pan, Tianyang Zhong, Peixin Dong, Kangni Xie, Yuxiao, Liu, Hanqi Jiang, Zhengliang Liu, Shijie Zhao, Tuo Zhang, Xi Jiang, Dinggang, Shen, Tianming Liu, Xin Zhang

TL;DR
This study evaluates the potential of multimodal large language models like Gemini and GPT-4 across diverse medical imaging datasets, highlighting their strengths and limitations in various diagnostic tasks and their promise for clinical integration.
Contribution
The paper provides an extensive evaluation of Gemini and GPT-4 models on multiple medical imaging tasks, revealing their capabilities and challenges in clinical data mining applications.
Findings
Gemini excels in report generation and lesion detection.
GPT models are proficient in lesion segmentation and localization.
Both models show promise in reducing physician workload.
Abstract
Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence (AGI) for computer vision, showcasing their potential in the biomedical domain. In this study, we evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets, including 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Topic Modeling · Biomedical Text Mining and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Cosine Annealing · Layer Normalization · Linear Layer · Label Smoothing · Attention Dropout · Adam · Dropout
