GPT-4V Cannot Generate Radiology Reports Yet

Yuyang Jiang; Chacha Chen; Dang Nguyen; Benjamin M. Mervak; Chenhao; Tan

arXiv:2407.12176·cs.CY·November 18, 2024·2 cites

GPT-4V Cannot Generate Radiology Reports Yet

Yuyang Jiang, Chacha Chen, Dang Nguyen, Benjamin M. Mervak, Chenhao, Tan

PDF

Open Access 2 Repos 1 Video

TL;DR

This study systematically evaluates GPT-4V's ability to generate radiology reports from chest X-rays, revealing significant shortcomings in image understanding and report quality, thus questioning its suitability for clinical use.

Contribution

The paper provides a comprehensive assessment of GPT-4V's performance in radiology report generation, highlighting its limitations in medical image reasoning and report synthesis.

Findings

01

GPT-4V performs poorly in lexical and clinical metrics.

02

The model's image reasoning is consistently low across prompts.

03

Generated reports are less accurate and natural than fine-tuned models.

Abstract

GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and find that it fails terribly in both lexical metrics and clinical efficacy metrics. To understand the low performance, we decompose the task into two steps: 1) the medical image reasoning step of predicting medical condition labels from images; and 2) the report synthesis step of generating reports from (groundtruth) conditions. We show that GPT-4V's performance in image reasoning is consistently low across different prompts. In fact, the distributions of model-predicted labels remain constant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

GPT-4V Cannot Generate Radiology Reports Yet· underline

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · Radiology practices and education · Artificial Intelligence in Healthcare and Education