A foundation model with multi-variate parallel attention to generate neuronal activity

Francesco Carzaniga; Michael Hersche; Abu Sebastian; Kaspar Schindler; Abbas Rahimi

arXiv:2506.20354·cs.LG·August 26, 2025

A foundation model with multi-variate parallel attention to generate neuronal activity

Francesco Carzaniga, Michael Hersche, Abu Sebastian, Kaspar Schindler, Abbas Rahimi

PDF

Open Access 1 Repo 1 Datasets 3 Reviews

TL;DR

This paper introduces MVPFormer, a foundation model for electrophysiology that uses a novel multi-variate parallel attention mechanism to effectively model heterogeneous iEEG data, achieving state-of-the-art clinical task performance.

Contribution

The paper presents MVPA, a new self-attention mechanism for heterogeneous time-series, and MVPFormer, the first open-source iEEG foundation model with superior clinical task results.

Findings

01

MVPFormer surpasses state-of-the-art in seizure detection across multiple datasets.

02

MVPA achieves comparable or better performance than existing attention models on standard tasks.

03

The SWEC iEEG dataset is the largest publicly available, supporting future research.

Abstract

Learning from multi-variate time-series with heterogeneous channel configurations remains a fundamental challenge for deep neural networks, particularly in clinical domains such as intracranial electroencephalography (iEEG), where channel setups vary widely across subjects. In this work, we introduce multi-variate parallel attention (MVPA), a novel self-attention mechanism that disentangles content, temporal, and spatial attention, enabling flexible, generalizable, and efficient modeling of time-series data with varying channel counts and configurations. We use MVPA to build MVPFormer, a generative foundation model for human electrophysiology, trained to predict the evolution of iEEG signals across diverse subjects. To support this and future efforts by the community, we release the SWEC iEEG dataset, the largest publicly available iEEG dataset to date, comprising nearly 10,000 hours of…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

- The paper is well-written with a clear motivation for the problem and an architectural approach. - The strong architecture design enables practical foundation models for clinical iEEG - MVPA's decomposition of attention into content, temporal, and spatial components is novel and specifically addresses the real-world challenge of heterogeneous channel configurations in clinical data. - The model demonstrates superior performance on seizure detection across three datasets and competitive resu

Weaknesses

- The model is trained only on iEEG data, whereas foundation models typically leverage diverse datasets across multiple domains and modalities. - The model is fine-tuned on target tasks, so calling evaluation on "unseen subjects" zero-shot is inaccurate. Normally, a true zero-shot would require no task-specific training data. Should be called fewshots? - Section 5.3 abruptly shifts to generic time-series forecasting and classification tasks, creating a disjointed narrative that dilutes the pape

Reviewer 02Rating 4Confidence 3

Strengths

1 MVPA factorizes self-attention into content, time, and channel to handle heterogeneous, variable-channel signals that standard attention struggles with. MVPFormer’s generative pretraining and the Long-term iEEG corpus add fresh angles on both model and data. 2 The method is solid and scalable, with a coherent pretraining recipe followed by light adaptation. 3. The three MVPA components and their roles in the logits are well-explained with figures. Pretraining and fine-tuning protocols are m

Weaknesses

1 Zero-shot tests use manual channel selection at inference, and preprocessing, post-processing, and thresholds are not harmonized across baselines. The reported gains may stem from pipeline differences rather than the core method. 2 Overstated “expert-level” claim: A single Kappa threshold from prior work is used instead of a same-dataset, same-protocol human comparison. No per-subject confidence intervals or significance tests are reported. 3 The MVPA decomposition lacks a rigorous derivati

Reviewer 03Rating 8Confidence 5

Strengths

● Overall, this paper represents a very strong contribution to the community. ● Releases a large amount of data. I am personally not aware of a larger publicly available iEEG dataset. This is a big boon to the community. Especially so, since many other foundation models for intracranial signal train on private data, e.g., BrainWave. ● Evaluation is thorough: the authors use their own seizure detection task as well as the Brain Treebank tasks. ● Performance on the epilepsy detection task exceeds

Weaknesses

● Am I misunderstanding something? The paper refers to a "generative" objective, but the loss seems to be discriminative, i.e., an InfoNCE loss? The output of the model is in the embedding space, not neural activity, correct? ● Line 364: The claim is that the choice of objective is justified by an ablation. But if I read appendix G.14 correctly, it seems that there is only justification for doing pretraining, not the specific type of pretraining, i.e., some other choice of pre-training objective

Code & Models

Repositories

ibm/multi-variate-parallel-transformer
pytorchOfficial

Datasets

NeuroTec/SWEC_iEEG_Dataset
dataset· 2.2k dl
2.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsDropout · Dense Connections · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Transformer