Multimodal ChatGPT for Medical Applications: an Experimental Study of   GPT-4V

Zhiling Yan; Kai Zhang; Rong Zhou; Lifang He; Xiang Li; Lichao Sun

arXiv:2310.19061·cs.CV·October 31, 2023·33 cites

Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V

Zhiling Yan, Kai Zhang, Rong Zhou, Lifang He, Xiang Li, Lichao Sun

PDF

Open Access 1 Repo

TL;DR

This study evaluates GPT-4V's performance on medical visual question answering across diverse datasets, revealing its current limitations and unreliability for real-world diagnostic applications.

Contribution

It provides a comprehensive assessment of GPT-4V's capabilities in medical VQA, highlighting its constraints and informing future improvements.

Findings

01

GPT-4V shows unreliable accuracy in medical diagnostics.

02

The model's behavior has seven distinct limitations.

03

GPT-4V is not recommended for real-world medical diagnostics.

Abstract

In this paper, we critically evaluate the capabilities of the state-of-the-art multimodal large language model, i.e., GPT-4 with Vision (GPT-4V), on Visual Question Answering (VQA) task. Our experiments thoroughly assess GPT-4V's proficiency in answering questions paired with images using both pathology and radiology datasets from 11 modalities (e.g. Microscopy, Dermoscopy, X-ray, CT, etc.) and fifteen objects of interests (brain, liver, lung, etc.). Our datasets encompass a comprehensive range of medical inquiries, including sixteen distinct question types. Throughout our evaluations, we devised textual prompts for GPT-4V, directing it to synergize visual and textual information. The experiments with accuracy score conclude that the current version of GPT-4V is not recommended for real-world diagnostics due to its unreliable and suboptimal accuracy in responding to diagnostic medical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhilingyan/gpt4v-medical-report
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · Label Smoothing · Byte Pair Encoding · Dense Connections · Position-Wise Feed-Forward Layer · Residual Connection