Performance of GPT-5 in Brain Tumor MRI Reasoning

Mojtaba Safari; Shansong Wang; Mingzhe Hu; Zach Eidex; Qiang Li; and Xiaofeng Yang

arXiv:2508.10865·cs.CV·August 15, 2025

Performance of GPT-5 in Brain Tumor MRI Reasoning

Mojtaba Safari, Shansong Wang, Mingzhe Hu, Zach Eidex, Qiang Li, and Xiaofeng Yang

PDF

TL;DR

This study evaluates GPT-5 models' ability to perform brain tumor MRI reasoning through a specialized VQA benchmark, revealing moderate accuracy and highlighting current limitations for clinical application.

Contribution

First comprehensive assessment of GPT-5 models on neuro-oncological MRI VQA tasks, demonstrating their capabilities and limitations in structured medical reasoning.

Findings

01

GPT-5-mini achieved 44.19% accuracy

02

GPT-5 models outperform GPT-4o and GPT-5-nano

03

Performance varies across tumor subtypes

Abstract

Accurate differentiation of brain tumor types on magnetic resonance imaging (MRI) is critical for guiding treatment planning in neuro-oncology. Recent advances in large language models (LLMs) have enabled visual question answering (VQA) approaches that integrate image interpretation with natural language reasoning. In this study, we evaluated GPT-4o, GPT-5-nano, GPT-5-mini, and GPT-5 on a curated brain tumor VQA benchmark derived from 3 Brain Tumor Segmentation (BraTS) datasets - glioblastoma (GLI), meningioma (MEN), and brain metastases (MET). Each case included multi-sequence MRI triplanar mosaics and structured clinical features transformed into standardized VQA items. Models were assessed in a zero-shot chain-of-thought setting for accuracy on both visual and reasoning tasks. Results showed that GPT-5-mini achieved the highest macro-average accuracy (44.19%), followed by GPT-5…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.