NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI

Cosmin I. Bercea; Jun Li; Philipp Raffler; Evamaria O. Riedel; Lena Schmitzer; Angela Kurz; Felix Bitzer; Paula Ro{\ss}m\"uller; Julian Canisius; Mirjam L. Beyrle; Che Liu; Wenjia Bai; Bernhard Kainz; Julia A. Schnabel; Benedikt Wiestler

arXiv:2505.14064·eess.IV·May 21, 2025

NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI

Cosmin I. Bercea, Jun Li, Philipp Raffler, Evamaria O. Riedel, Lena Schmitzer, Angela Kurz, Felix Bitzer, Paula Ro{\ss}m\"uller, Julian Canisius, Mirjam L. Beyrle, Che Liu, Wenjia Bai, Bernhard Kainz, Julia A. Schnabel, Benedikt Wiestler

PDF

Open Access 1 Datasets

TL;DR

NOVA is a challenging, real-world benchmark with 900 brain MRI scans covering 281 rare pathologies, designed to evaluate models' ability to detect, localize, and reason about unseen anomalies without training on such data.

Contribution

The paper introduces NOVA, a novel benchmark dataset for out-of-distribution detection and clinical reasoning in brain MRI, emphasizing evaluation of models on truly unseen, rare pathologies.

Findings

01

Baseline models show significant performance drops on NOVA.

02

NOVA reveals limitations of current vision-language models in medical out-of-distribution tasks.

03

Benchmark encourages development of more robust, generalizable medical AI models.

Abstract

In many real-world applications, deployed models encounter inputs that differ from the data seen during training. Out-of-distribution detection identifies whether an input stems from an unseen distribution, while open-world recognition flags such inputs to ensure the system remains robust as ever-emerging, previously $u nk n o w n$ categories appear and must be addressed without retraining. Foundation and vision-language models are pre-trained on large and diverse datasets with the expectation of broad generalization across domains, including medical imaging. However, benchmarking these models on test sets with only a few common outlier types silently collapses the evaluation back to a closed-set problem, masking failures on rare or truly novel conditions encountered in clinical use. We therefore present $N O V A$ , a challenging, real-life $e v a l u a t i o n - o n l y$ benchmark of $\sim$ 900 brain MRI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

c-i-ber/Nova
dataset· 343 dl
343 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications