TL;DR
Brain-DiT is a versatile multi-state fMRI foundation model pretrained on diverse datasets using metadata-conditioned diffusion, improving generalization across various brain states and downstream tasks.
Contribution
Introduces Brain-DiT, a novel diffusion transformer-based pretraining method for multi-state fMRI, leveraging metadata conditioning to enhance representation learning.
Findings
Diffusion-based pretraining outperforms reconstruction or alignment methods.
Metadata conditioning improves downstream task performance.
Different tasks prefer different levels of representational scale.
Abstract
Current fMRI foundation models primarily rely on a limited range of brain states and mismatched pretraining tasks, restricting their ability to learn generalized representations across diverse brain states. We present \textit{Brain-DiT}, a universal multi-state fMRI foundation model pretrained on 349,898 sessions from 24 datasets spanning resting, task, naturalistic, disease, and sleep states. Unlike prior fMRI foundation models that rely on masked reconstruction in the raw-signal space or a latent space, \textit{Brain-DiT} adopts metadata-conditioned diffusion pretraining with a Diffusion Transformer (DiT), enabling the model to learn multi-scale representations that capture both fine-grained functional structure and global semantics. Across extensive evaluations and ablations on 7 downstream tasks, we find consistent evidence that diffusion-based generative pretraining is a stronger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
