Comparison of Multiple State-of-the-Art Large Language Models for Patient Education Prior to CT and MRI Examinations

Semil Eminovic; Bogdan Levita; Andrea Dell’Orco; Jonas Alexander Leppig; Jawed Nawabi; Tobias Penzkofer

PMC · DOI:10.3390/jpm15060235·June 5, 2025

Comparison of Multiple State-of-the-Art Large Language Models for Patient Education Prior to CT and MRI Examinations

Semil Eminovic, Bogdan Levita, Andrea Dell’Orco, Jonas Alexander Leppig, Jawed Nawabi, Tobias Penzkofer

PDF

Open Access

TL;DR

This study compares how well advanced AI models answer patient questions about CT and MRI exams, showing they can help but also risk spreading misinformation.

Contribution

The study evaluates multiple LLMs for patient education in radiology, identifying performance differences and highlighting clinical risks.

Findings

01

ChatGPT-4o scored highest for CT-related questions and tied for MRI-related ones.

02

Google Gemini had the most potentially misleading responses, especially for CT.

03

Mistral Large 2 underperformed compared to other models in CT-related questions.

Abstract

Background/Objectives: This study compares the accuracy of responses from state-of-the-art large language models (LLMs) to patient questions before CT and MRI imaging. We aim to demonstrate the potential of LLMs in improving workflow efficiency, while also highlighting risks such as misinformation. Methods: There were 57 CT-related and 64 MRI-related patient questions displayed to ChatGPT-4o, Claude 3.5 Sonnet, Google Gemini, and Mistral Large 2. Each answer was evaluated by two board-certified radiologists and scored for accuracy/correctness/likelihood to mislead using a 5-point Likert scale. Statistics compared LLM performance across question categories. Results: ChatGPT-4o achieved the highest average scores for CT-related questions and tied with Claude 3.5 Sonnet for MRI-related questions, with higher scores across all models for MRI (ChatGPT-4o: CT [4.52 (± 0.46)], MRI: [4.79 (±…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Figures2

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Radiomics and Machine Learning in Medical Imaging · Radiology practices and education