Performance of Large Language Models in Technical MRI Question   Answering: A Comparative Study

Alan B McMillan

arXiv:2411.12238·physics.med-ph·November 20, 2024

Performance of Large Language Models in Technical MRI Question Answering: A Comparative Study

Alan B McMillan

PDF

Open Access

TL;DR

This study evaluates the accuracy of various large language models in answering technical MRI questions, finding that both open-source and closed-source models perform well, with potential to improve MRI practice and education.

Contribution

The paper provides a comprehensive comparison of multiple LLMs' performance on MRI-related questions, highlighting their potential in clinical and educational settings.

Findings

01

Closed-source o1 Preview achieved 94% accuracy.

02

Open-source Phi 3.5 Mini achieved 78% accuracy.

03

Models performed best in Basic Principles and Instrumentation.

Abstract

Background: Advances in artificial intelligence, particularly large language models (LLMs), have the potential to enhance technical expertise in magnetic resonance imaging (MRI), regardless of operator skill or geographic location. Methods: We assessed the accuracy of several LLMs in answering 570 technical MRI questions derived from a standardized review book. The questions spanned nine MRI topics, including Basic Principles, Image Production, and Safety. Closed-source models (e.g., OpenAI's o1 Preview, GPT-4o, GPT-4 Turbo, and Claude 3.5 Haiku) and open-source models (e.g., Phi 3.5 Mini, Llama 3.1, smolLM2) were tested. Models were queried using standardized prompts via the LangChain framework, and responses were graded against correct answers using an automated scoring protocol. Accuracy, defined as the proportion of correct answers, was the primary outcome. Results: The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems