Bridging Foundation Models and Efficient Architectures: A Modular Brain Imaging Framework with Local Masking and Pretrained Representation Learning

Yanwen Wang; Xinglin Zhao; Yijin Song; Xiaobo Liu; Yanrong Hao; Rui Cao; Xin Wen

arXiv:2508.16597·q-bio.NC·August 26, 2025

Bridging Foundation Models and Efficient Architectures: A Modular Brain Imaging Framework with Local Masking and Pretrained Representation Learning

Yanwen Wang, Xinglin Zhao, Yijin Song, Xiaobo Liu, Yanrong Hao, Rui Cao, Xin Wen

PDF

TL;DR

This paper introduces a modular neuroimaging framework combining foundation models with domain-specific architectures, improving age and cognition prediction from fMRI data through efficient pretraining, clustering, and state-space modeling.

Contribution

It presents a novel modular framework integrating a Local Masked Autoencoder, a Random Walk Mixture of Experts, and a state-space model for improved fMRI analysis.

Findings

01

Achieved MAE of 5.343 for age prediction

02

Attained PCC of 0.928 for age correlation

03

Outperformed existing state-of-the-art methods

Abstract

Functional connectivity (FC) derived from resting-state fMRI plays a critical role in personalized predictions such as age and cognitive performance. However, applying foundation models(FM) to fMRI data remains challenging due to its high dimensionality, computational complexity, and the difficulty in capturing complex spatiotemporal dynamics and indirect region-of-interest (ROI) interactions. To address these limitations, we propose a modular neuroimaging framework that integrates principles from FM with efficient, domain-specific architectures. Our approach begins with a Local Masked Autoencoder (LMAE) for pretraining, which reduces the influence of hemodynamic response function (HRF) dynamics and suppresses noise. This is followed by a Random Walk Mixture of Experts (RWMOE) module that clusters features across spatial and temporal dimensions, effectively capturing intricate brain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.