Batch Effects In Brain Foundation Model Embeddings

Ye Tao; Bradley T. Baker; Yu Wu; Anand D. Sarwate; Sandeep Panta; Sergey Plis; Vince D. Calhoun

arXiv:2604.14441·eess.SP·April 17, 2026

Batch Effects In Brain Foundation Model Embeddings

Ye Tao, Bradley T. Baker, Yu Wu, Anand D. Sarwate, Sandeep Panta, Sergey Plis, Vince D. Calhoun

PDF

TL;DR

This study evaluates neuroimaging foundation model embeddings, revealing they encode significant batch effects that can overshadow diagnosis signals, and examines how harmonization impacts these embeddings.

Contribution

It systematically assesses the extent of batch effects in BrainLM and SwiFT embeddings and compares their regional versus interaction-based representations.

Findings

01

Embeddings encode substantial batch-related variability.

02

Harmonization reduces batch effects but may affect biological signals.

03

BrainLM captures regional activity; SwiFT captures inter-regional interactions.

Abstract

Foundation models show strong potential for large-scale, high-dimensional biomedical applications, yet their ability to capture relevant neurobiological characteristics remains underexplored. We systematically evaluate embeddings from two neuroimaging foundation models, BrainLM and SwiFT, across multi-site fMRI datasets using a comprehensive evaluation framework. Our results show that foundation model embeddings encode substantial batch-related variability, often dominating diagnosis-related information across heterogeneous datasets. We further investigate how harmonization, applied to reduce batch effects, influences these embeddings. In addition, we find that BrainLM prefers to capture fine-grained regional activity, whereas SwiFT tends to represent interactions between regions, consistent with their respective model architectures. Our study highlights the importance of accounting for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.