Exploring Multimodal Large Language Models for Radiology Report   Error-checking

Jinge Wu; Yunsoo Kim; Eva C. Keller; Jamie Chow; Adam P. Levine,; Nikolas Pontikos; Zina Ibrahim; Paul Taylor; Michelle C. Williams; Honghan Wu

arXiv:2312.13103·cs.CL·March 5, 2024·1 cites

Exploring Multimodal Large Language Models for Radiology Report Error-checking

Jinge Wu, Yunsoo Kim, Eva C. Keller, Jamie Chow, Adam P. Levine,, Nikolas Pontikos, Zina Ibrahim, Paul Taylor, Michelle C. Williams, Honghan Wu

PDF

Open Access

TL;DR

This study demonstrates that multimodal large language models can effectively assist radiologists in error detection within reports, outperforming baseline models and even surpassing some clinicians in accuracy, especially in binary error-checking tasks.

Contribution

The paper introduces a novel application of multimodal LLMs for error-checking in radiology reports, including a new evaluation dataset and fine-tuning techniques for improved performance.

Findings

01

Model improved error detection accuracy by 47.4% on MIMIC-CXR.

02

Model outperformed baseline by 25.4% on IU X-ray dataset.

03

Ensemble mode correctly identified 71.4% of challenging cases.

Abstract

This paper proposes one of the first clinical applications of multimodal large language models (LLMs) as an assistant for radiologists to check errors in their reports. We created an evaluation dataset from real-world radiology datasets (including X-rays and CT scans). A subset of original reports was modified to contain synthetic errors by introducing three types of mistakes: "insert", "remove", and "substitute". The evaluation contained two difficulty levels: SIMPLE for binary error-checking and COMPLEX for identifying error types. At the SIMPLE level, our fine-tuned model significantly enhanced performance by 47.4% and 25.4% on MIMIC-CXR and IU X-ray data, respectively. This performance boost is also observed in unseen modality, CT scans, as the model performed 19.46% better than the baseline model. The model also surpassed the domain expert's accuracy in the MIMIC-CXR dataset by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Radiology practices and education · Artificial Intelligence in Healthcare and Education

MethodsSparse Evolutionary Training