UCSF-PDGM-VQA: Visual Question Answering dataset for brain tumor MRI interpretation
Shiv Ghosh, Junayd Lateef, Chih-Hua Liu, Yannan Yu, Andreas M. Rauschecker, Madhumita Sushil

TL;DR
This paper introduces the UCSF-PDGM-VQA dataset, a benchmark for evaluating vision-language models on brain tumor MRI interpretation, revealing current models' limitations in clinical neuro-oncology applications.
Contribution
The paper presents a new VQA benchmark for brain tumor MRI analysis and evaluates six models, highlighting their inability to effectively process complex 3D multi-sequence scans.
Findings
Current models struggle with multi-sequence 3D MRI data.
Models tend to rely on language priors, neglecting visual features.
There is a critical need for domain-specific vision-language models.
Abstract
Brain tumor diagnosis is largely dependent on Magnetic Resonance Imaging (MRI) evaluation, which requires radiologists to synthesize thousands of images across multiple 3D sequences and longitudinal studies. This process requires advanced neuro-radiology training, poses substantial cognitive load, and is highly time-consuming. Despite increasing demands in radiology, this expertise is difficult to scale, straining the current health systems. Vision-Language Models (VLMs) provide an opportunity to reduce this burden through a semi-automated, interactive interpretation of complex brain MRIs. However, they are currently underutilized in neuro-oncology due to a lack of specialized benchmarks for evaluating them. We introduce a clinically relevant visual question answering (VQA) benchmark -- the UCSF-PDGM-VQA dataset -- consisting of 2,387 QA pairs from 473 glioma-related MRI studies in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
