A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning
Stefano Cerri, Asbj{\o}rn Munk, Sebastian N{\o}rgaard Llambias, Jakob Ambsdorf, Julia Machnio, Vardan Nersesjan, Christian Hedeager Krag, Peirong Liu, Pablo Rocamora Garc\'ia, Mostafa Mehdipour Ghazi, Mikael Boesen, Michael Eriksen Benros, Juan Eugenio Iglesias, Mads Nielsen

TL;DR
FOMO260K is a comprehensive large-scale heterogeneous MRI dataset designed to facilitate self-supervised learning research in medical imaging, including diverse scans and minimal preprocessing.
Contribution
The paper introduces FOMO260K, a new extensive MRI dataset with accompanying code and pretrained models for self-supervised learning.
Findings
Dataset includes 260,927 scans from diverse sources.
Supports development of self-supervised learning methods.
Provides pretrained models and code for benchmarking.
Abstract
We present FOMO260K, a large-scale, heterogeneous dataset of 260,927 brain Magnetic Resonance Imaging (MRI) scans from 77,589 MRI sessions and 55,378 subjects, aggregated from 910 publicly available sources. The dataset includes both clinical- and research-grade images, multiple MRI sequences, and a wide range of anatomical and pathological variability, including scans with large brain anomalies. Minimal preprocessing was applied to preserve the original image characteristics while reducing entry barriers for new users. Companion code for self-supervised pretraining and finetuning is provided, along with pretrained models. FOMO260K is intended to support the development and benchmarking of self-supervised learning methods in medical imaging at scale.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
