SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis
Mo Wang, Junfeng Xia, Wenhao Ye, Enyu Liu, Kaining Peng, Jianfeng Feng, Quanying Liu, Hongkai Wen

TL;DR
SLIM-Brain is a novel atlas-free foundation model for fMRI analysis that significantly improves data and training efficiency, achieving state-of-the-art results with less data and computational resources.
Contribution
It introduces a two-stage adaptive design combining a lightweight temporal extractor and a hierarchical encoder for efficient voxel-level fMRI modeling.
Findings
Achieves state-of-the-art performance on seven benchmarks.
Requires only 4,000 pre-training sessions and 30% GPU memory of traditional methods.
Effectively balances spatial fidelity and computational efficiency.
Abstract
Foundation models are emerging as a powerful paradigm for fMRI analysis, but current approaches face a dual bottleneck of data- and training-efficiency. Atlas-based methods aggregate voxel signals into fixed regions of interest, reducing data dimensionality but discarding fine-grained spatial details, and requiring extremely large cohorts to train effectively as general-purpose foundation models. Atlas-free methods, on the other hand, operate directly on voxel-level information - preserving spatial fidelity but are prohibitively memory- and compute-intensive, making large-scale pre-training infeasible. We introduce SLIM-Brain (Sample-efficient, Low-memory fMRI Foundation Model for Human Brain), a new atlas-free foundation model that simultaneously improves both data- and training-efficiency. SLIM-Brain adopts a two-stage adaptive design: (i) a lightweight temporal extractor captures…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
1. **Well-motivated problem:** The paper clearly describes the dual bottleneck of data efficiency (atlas-based methods need massive cohorts) and training efficiency (atlas-free methods are computationally prohibitive). 2. **Memory efficiency gains:** Demonstrating ~70% reduction in GPU memory (8GB→2.3GB per sample) by excluding background voxels and using Hiera's unit-wise masking is a practical contribution, especially for democratizing research access. 3. **Neurobiological validation:** Compar
1. **Problems with Experimental Validation:** The authors state they retrain baselines "on the same total number of sessions" and report "best validation checkpoints." But a lot of information is missing from this statement. - Did you perform hyperparameter search when retraining the baselines? - Missing training dynamics to show convergence plots. Brain-JEPA drops from 87.1% (32k) to 54% (1k) and fingerprint goes to 1.0% accuracy (essentially random/collapsed), this suggests the model is collap
- Competitive results compared to shown benchmarks. - Resource optimization by temporal sampling. - Efficiency regarding pretraining corpus size.
- I believe the model is not compared to key previous works, specifically Swift (Kim et al., 2023) and the model by Malkiel et al. (2022), and I believe it underperforms them. If this is not the case and the model is shown to be competitive, I will revisit my rating. - The work proposes a "Foundation model" and in the same time claims to be data efficient, to me the two seem at odd with each other. Either be a foundation model trained on vast amount of data, or a data efficient approach. Refer
- The authors conduct ablation studies to justify their design choices. The top-k frame selection idea seems especially innovative and surprisingly effective. - Innovations such as removing the background and only keeping the brain region also seem to be a meaningful point that other researchers may have overlooked in the past. - The experiments show that SLIM-Brain shows strong performance. As listed below, I have issues with the HCP experiments, but for the other datasets, the proposed model s
1. **Emphasis on pre-training data burden.** I agree with the authors' point that the need for a large dataset is a huge burden. However, in terms of practical utility, I believe the size of the fine-tuning dataset is a much bigger issue than the size of the pre-training dataset. As long as there is someone that is willing to train and release the model, the user does not need to worry about the scale of the pre-training dataset. The authors demonstrate this very point, as they were able to dow
1. **Novel architectural design**: The combination of global MAE, learnable top-k selection, and 4D Hiera-JEPA is an interesting approach to balance computational efficiency and representation quality. 2. **Significant computational gains**: Achieves -70% reduction in GPU memory (8GB → 2.3GB, Table 3) and dramatically faster training (~1 hour vs. 150 hours for baselines). 3. **Strong performance on some tasks**: Particularly impressive on fingerprint identification (98.5%, Table 1) and competiti
## **Major Issues** **W1. Core claim unvalidated [Critical]** The "data-efficient" claim in the title lacks empirical support. All experiments use only ~1k subjects, while baselines demonstrate scaling curves up to 32k-65k subjects (Figure 1). Given that the model is "training-efficient" (low memory, fast training), scaling experiments should be straightforward. The absence raises critical questions: (1) Does performance saturate quickly (supporting data-efficiency)? (2) Does it improve with m
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Face Recognition and Perception · Domain Adaptation and Few-Shot Learning
