NeuralBench: A Unifying Framework to Benchmark NeuroAI Models

Hubert Banville; St\'ephane d'Ascoli; Simon Dahan; J\'er\'emy Rapin; Marl\`ene Careil; Yohann Benchetrit; Jarod L\'evy; Saarang Panchavati; Antoine Ratouchniak; Mingfang (Lucy) Zhang; Elisa Cascardi; Katelyn Begany; Teon Brooks; Jean-R\'emi King

arXiv:2605.08495·cs.LG·May 12, 2026

NeuralBench: A Unifying Framework to Benchmark NeuroAI Models

Hubert Banville, St\'ephane d'Ascoli, Simon Dahan, J\'er\'emy Rapin, Marl\`ene Careil, Yohann Benchetrit, Jarod L\'evy, Saarang Panchavati, Antoine Ratouchniak, Mingfang (Lucy) Zhang, Elisa Cascardi, Katelyn Begany, Teon Brooks, Jean-R\'emi King

PDF

TL;DR

NeuralBench is an open-source, unified benchmarking framework for AI models of brain activity, currently focusing on EEG with 36 tasks and 14 architectures, revealing current models' limitations and challenges.

Contribution

This work introduces NeuralBench, a comprehensive, extensible platform for standardized evaluation of neuroimaging AI models across multiple modalities and datasets.

Findings

01

Foundation models only marginally outperform task-specific models.

02

Many neuroimaging tasks remain highly challenging for current models.

Abstract

Deep learning and large public datasets have recently catalyzed the proliferation of AI models for processing brain recordings. However, systematically evaluating these models remains a challenge: not only do the preprocessing pipelines, training and finetuning approaches largely vary across studies, but their downstream evaluation is often limited to small sets of tasks and/or datasets. Here, we present NeuralBench: a unified framework for benchmarking AI models of brain activity. We accompany this framework with NeuralBench-EEG v1.0 -- a large EEG benchmark that includes 36 electroencephalography (EEG) tasks and 14 deep learning architectures, and is evaluated on 94 datasets accessed through a standardized interface. This first EEG-focused release already highlights two main findings. First, current foundation models only marginally outperform task-specific models. Second, a large set…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.