Batch Effects In Brain Foundation Model Embeddings
Ye Tao, Bradley T. Baker, Yu Wu, Anand D. Sarwate, Sandeep Panta, Sergey Plis, Vince D. Calhoun

TL;DR
This study evaluates neuroimaging foundation model embeddings, revealing they encode significant batch effects that can overshadow diagnosis signals, and examines how harmonization impacts these embeddings.
Contribution
It systematically assesses the extent of batch effects in BrainLM and SwiFT embeddings and compares their regional versus interaction-based representations.
Findings
Embeddings encode substantial batch-related variability.
Harmonization reduces batch effects but may affect biological signals.
BrainLM captures regional activity; SwiFT captures inter-regional interactions.
Abstract
Foundation models show strong potential for large-scale, high-dimensional biomedical applications, yet their ability to capture relevant neurobiological characteristics remains underexplored. We systematically evaluate embeddings from two neuroimaging foundation models, BrainLM and SwiFT, across multi-site fMRI datasets using a comprehensive evaluation framework. Our results show that foundation model embeddings encode substantial batch-related variability, often dominating diagnosis-related information across heterogeneous datasets. We further investigate how harmonization, applied to reduce batch effects, influences these embeddings. In addition, we find that BrainLM prefers to capture fine-grained regional activity, whereas SwiFT tends to represent interactions between regions, consistent with their respective model architectures. Our study highlights the importance of accounting for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
