A Structured Review and Quantitative Profiling of Public Brain MRI Datasets for Foundation Model Development
Minh Sao Khue Luu, Margaret V. Benedichuk, Ekaterina I. Roppert, Roman M. Kenzhin, Bair N. Tuchinov

TL;DR
This paper systematically reviews 54 public brain MRI datasets, analyzing their variability and heterogeneity to inform the development of robust foundation models, highlighting the importance of preprocessing and domain adaptation.
Contribution
It provides a comprehensive, multi-level characterization of public brain MRI datasets and evaluates preprocessing effects, emphasizing the need for harmonization in foundation model development.
Findings
Significant heterogeneity in voxel spacing, orientation, and intensity across datasets.
Preprocessing improves within-dataset consistency but residual inter-dataset differences remain.
Residual covariate shift persists after standard preprocessing, affecting model generalization.
Abstract
The development of foundation models for brain MRI depends critically on the scale, diversity, and consistency of available data, yet systematic assessments of these factors remain scarce. In this study, we analyze 54 publicly accessible brain MRI datasets encompassing over 538,031 to provide a structured, multi-level overview tailored to foundation model development. At the dataset level, we characterize modality composition, disease coverage, and dataset scale, revealing strong imbalances between large healthy cohorts and smaller clinical populations. At the image level, we quantify voxel spacing, orientation, and intensity distributions across 15 representative datasets, demonstrating substantial heterogeneity that can influence representation learning. We then perform a quantitative evaluation of preprocessing variability, examining how intensity normalization, bias field…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies · Glioma Diagnosis and Treatment · Advanced MRI Techniques and Applications
