OmniBrainBench: A Comprehensive Multimodal Benchmark for Brain Imaging Analysis Across Multi-stage Clinical Tasks

Zhihao Peng; Cheng Wang; Shengyuan Liu; Zhiying Liang; Zanting Ye; Minjie Ju; PeterYM Woo; Yixuan Yuan

arXiv:2511.00846·cs.CV·December 29, 2025

OmniBrainBench: A Comprehensive Multimodal Benchmark for Brain Imaging Analysis Across Multi-stage Clinical Tasks

Zhihao Peng, Cheng Wang, Shengyuan Liu, Zhiying Liang, Zanting Ye, Minjie Ju, PeterYM Woo, Yixuan Yuan

PDF

Open Access 1 Datasets

TL;DR

OmniBrainBench is a comprehensive multimodal benchmark designed to evaluate large language models' understanding of brain imaging across multiple clinical tasks, revealing significant gaps compared to physicians.

Contribution

It introduces the first extensive multimodal VQA benchmark for brain imaging analysis, covering 15 modalities and 15 clinical tasks, enabling thorough assessment of MLLMs in medical contexts.

Findings

01

Proprietary MLLMs like GPT-5 outperform others but still lag behind physicians.

02

All models struggle with complex preoperative reasoning tasks.

03

Open-source models excel in specific tasks but lack general clinical understanding.

Abstract

Brain imaging analysis is crucial for diagnosing and treating brain disorders, and multimodal large language models (MLLMs) are increasingly supporting it. However, current brain imaging visual question-answering (VQA) benchmarks either cover a limited number of imaging modalities or are restricted to coarse-grained pathological descriptions, hindering a comprehensive assessment of MLLMs across the full clinical continuum. To address these, we introduce OmniBrainBench, the first comprehensive multimodal VQA benchmark specifically designed to assess the multimodal comprehension capabilities of MLLMs in brain imaging analysis with closed- and open-ended evaluations. OmniBrainBench comprises 15 distinct brain imaging modalities collected from 30 verified medical sources, yielding 9,527 validated VQA pairs and 31,706 images. It simulates clinical workflows and encompasses 15 multi-stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

FrankPN/OmniBrainBench
dataset· 40 dl
40 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Neurobiology of Language and Bilingualism