Fusion of biomedical imaging studies for increased sample size and diversity: a case study of brain MRI
Matias Aiskovich, Eduardo Castro, Jenna M. Reinen, Shreyas Fadnavis, Anushree Mehta, Hongyang Li, Amit Dhurandhar, Guillermo A. Cecchi, Pablo Polosecki

TL;DR
This paper presents a method to combine multiple brain MRI datasets to increase sample size and diversity for machine learning, addressing challenges in data integration.
Contribution
The paper introduces a flexible database structure and practical approach for homogenizing heterogeneous biomedical imaging datasets.
Findings
The fusion of 12 studies resulted in approximately 84,000 brain MRI images from 54,000 subjects.
Key challenges in dataset integration include heterogeneity in study design and metadata.
A flexible database structure was developed to accommodate diverse MRI datasets.
Abstract
Data collection, curation, and cleaning constitute a crucial phase in Machine Learning (ML) projects. In biomedical ML, it is often desirable to leverage multiple datasets to increase sample size and diversity, but this poses unique challenges, which arise from heterogeneity in study design, data descriptors, file system organization, and metadata. In this study, we present an approach to the integration of multiple brain MRI datasets with a focus on homogenization of their organization and preprocessing for ML. We use our own fusion example (approximately 84,000 images from 54,000 subjects, 12 studies, and 88 individual scanners) to illustrate and discuss the issues faced by study fusion efforts, and we examine key decisions necessary during dataset homogenization, presenting in detail a database structure flexible enough to accommodate multiple observational MRI datasets. We believe…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Machine Learning in Healthcare · Functional Brain Connectivity Studies
