Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model

Sam Gijsen; Marc-Andre Schulz; Kerstin Ritter

arXiv:2512.11582·cs.LG·March 3, 2026

Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model

Sam Gijsen, Marc-Andre Schulz, Kerstin Ritter

PDF

Open Access 1 Models 3 Reviews

TL;DR

Brain-Semantoks introduces a self-supervised framework that learns robust, high-level brain dynamic representations from noisy fMRI data, improving downstream task performance and out-of-distribution generalization.

Contribution

The paper presents a novel semantic tokenizer and self-distillation training method tailored for brain dynamics, enabling more stable and meaningful representations from noisy fMRI signals.

Findings

01

Effective in downstream tasks with linear probes

02

Outperforms existing models in out-of-distribution scenarios

03

Scaling with more unlabeled data improves performance

Abstract

The development of foundation models for functional magnetic resonance imaging (fMRI) time series holds significant promise for predicting phenotypes related to disease and cognition. Current models, however, are often trained using a mask-and-reconstruct objective on small brain regions. This focus on low-level information leads to representations that are sensitive to noise and temporal fluctuations, necessitating extensive fine-tuning for downstream tasks. We introduce Brain-Semantoks, a self-supervised framework designed specifically to learn abstract representations of brain dynamics. Its architecture is built on two core innovations: a semantic tokenizer that aggregates noisy regional signals into robust tokens representing functional networks, and a self-distillation objective that enforces representational stability across time. We show that this objective is stabilized through…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

- The model is evaluated on multiple datasets, including UK Biobank, ABIDE, HBN, SRPBS, and LEMON, and shows consistent improvements over strong baselines such as BrainLM and Brain-JEPA. - The paper also includes extensive ablation studies on the effects of the semantic tokenizer design, temporal regularizer duration, masking type, loss components, and masking ratio.

Weaknesses

- A main weakness is that all experiments rely solely on resting-state fMRI data, which limits the claim of being a true foundation model. - Tables 1 and 2 need to have a statistical comparison between the best model and other models. It is commonly done with a pairwise test. Then, p-values are usually corrected for multiple comparisons.

Reviewer 02Rating 8Confidence 4

Strengths

• Innovative use of self-distillation and semantic tokenization to learn stable, abstract representations of brain activity. • Clear performance improvements over existing foundation models (e.g., BrainLM, Brain-JEPA). • Semantic tokenizer proves particularly effective for demographic and clinical predictions (age, sex, ASD). • The shift from low-level voxel embeddings to network-based embeddings is conceptually strong. • Significant ablation studies to explore the benefit of each component into

Weaknesses

• While results are solid, gains from other architectural components beyond the tokenizer are more modest. • The paper could discuss temporal resolution more thoroughly — would finer sampling (e.g., sub-2s TR) lead to better representations or unnecessary noise amplification? • The work focuses exclusively on resting-state data; some commentary on potential extension to task-based fMRI would strengthen the contribution.

Reviewer 03Rating 4Confidence 4

Strengths

- Clear reframing toward semantic abstraction with a neuroscience‑grounded tokenizer operating at functional network granularity, reducing token length and noise while injecting inductive bias. - The slice masking to avoid trivial interpolation is a strong regularization that forces the model to learn meaningful relationships between tokens. - Well‑designed curriculum via TTR that averages network tokens over time early in training, improving stability of the model during training - Rigorous

Weaknesses

- The atlas choice is mostly arbitrary. No analysis of how results change with alternative parcellations (Schaefer, Shen, Yeo‑17) or different subcortical/cerebellar groupings; no exploration of data‑driven network discovery to justify the choice of nine functional networks. - The geometry of learned network identity embeddings is not analyzed; it is unclear whether they capture canonical inter‑network relationships or known hierarchies. - Precise kernel sequences and decay parameters are uncl

Code & Models

Models

🤗
SamGijsen/Brain-Semantoks
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFunctional Brain Connectivity Studies · EEG and Brain-Computer Interfaces · Generative Adversarial Networks and Image Synthesis