A foundation model with multi-variate parallel attention to generate neuronal activity
Francesco Carzaniga, Michael Hersche, Abu Sebastian, Kaspar Schindler, Abbas Rahimi

TL;DR
This paper introduces MVPFormer, a foundation model for electrophysiology that uses a novel multi-variate parallel attention mechanism to effectively model heterogeneous iEEG data, achieving state-of-the-art clinical task performance.
Contribution
The paper presents MVPA, a new self-attention mechanism for heterogeneous time-series, and MVPFormer, the first open-source iEEG foundation model with superior clinical task results.
Findings
MVPFormer surpasses state-of-the-art in seizure detection across multiple datasets.
MVPA achieves comparable or better performance than existing attention models on standard tasks.
The SWEC iEEG dataset is the largest publicly available, supporting future research.
Abstract
Learning from multi-variate time-series with heterogeneous channel configurations remains a fundamental challenge for deep neural networks, particularly in clinical domains such as intracranial electroencephalography (iEEG), where channel setups vary widely across subjects. In this work, we introduce multi-variate parallel attention (MVPA), a novel self-attention mechanism that disentangles content, temporal, and spatial attention, enabling flexible, generalizable, and efficient modeling of time-series data with varying channel counts and configurations. We use MVPA to build MVPFormer, a generative foundation model for human electrophysiology, trained to predict the evolution of iEEG signals across diverse subjects. To support this and future efforts by the community, we release the SWEC iEEG dataset, the largest publicly available iEEG dataset to date, comprising nearly 10,000 hours of…
Peer Reviews
Decision·ICLR 2026 Poster
- The paper is well-written with a clear motivation for the problem and an architectural approach. - The strong architecture design enables practical foundation models for clinical iEEG - MVPA's decomposition of attention into content, temporal, and spatial components is novel and specifically addresses the real-world challenge of heterogeneous channel configurations in clinical data. - The model demonstrates superior performance on seizure detection across three datasets and competitive resu
- The model is trained only on iEEG data, whereas foundation models typically leverage diverse datasets across multiple domains and modalities. - The model is fine-tuned on target tasks, so calling evaluation on "unseen subjects" zero-shot is inaccurate. Normally, a true zero-shot would require no task-specific training data. Should be called fewshots? - Section 5.3 abruptly shifts to generic time-series forecasting and classification tasks, creating a disjointed narrative that dilutes the pape
1 MVPA factorizes self-attention into content, time, and channel to handle heterogeneous, variable-channel signals that standard attention struggles with. MVPFormer’s generative pretraining and the Long-term iEEG corpus add fresh angles on both model and data. 2 The method is solid and scalable, with a coherent pretraining recipe followed by light adaptation. 3. The three MVPA components and their roles in the logits are well-explained with figures. Pretraining and fine-tuning protocols are m
1 Zero-shot tests use manual channel selection at inference, and preprocessing, post-processing, and thresholds are not harmonized across baselines. The reported gains may stem from pipeline differences rather than the core method. 2 Overstated “expert-level” claim: A single Kappa threshold from prior work is used instead of a same-dataset, same-protocol human comparison. No per-subject confidence intervals or significance tests are reported. 3 The MVPA decomposition lacks a rigorous derivati
● Overall, this paper represents a very strong contribution to the community. ● Releases a large amount of data. I am personally not aware of a larger publicly available iEEG dataset. This is a big boon to the community. Especially so, since many other foundation models for intracranial signal train on private data, e.g., BrainWave. ● Evaluation is thorough: the authors use their own seizure detection task as well as the Brain Treebank tasks. ● Performance on the epilepsy detection task exceeds
● Am I misunderstanding something? The paper refers to a "generative" objective, but the loss seems to be discriminative, i.e., an InfoNCE loss? The output of the model is in the embedding space, not neural activity, correct? ● Line 364: The claim is that the choice of objective is justified by an ablation. But if I read appendix G.14 correctly, it seems that there is only justification for doing pretraining, not the specific type of pretraining, i.e., some other choice of pre-training objective
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsDropout · Dense Connections · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Transformer
