Machine Learning with Multi-Site Imaging Data: An Empirical Study on the Impact of Scanner Effects
Ben Glocker, Robert Robinson, Daniel C. Castro, Qi Dou, Ender, Konukoglu

TL;DR
This empirical study shows that scanner effects significantly impact machine learning results on multi-site neuroimaging data, with current harmonization methods failing to fully remove biases and affecting model generalization.
Contribution
It provides a comprehensive empirical analysis of scanner effects on multi-site MRI data and highlights the limitations of existing harmonization techniques.
Findings
Classifiers can distinguish data origin with high accuracy despite pre-processing.
Current harmonization methods do not fully eliminate scanner-specific biases.
Scanner effects can lead to overly optimistic performance estimates.
Abstract
This is an empirical study to investigate the impact of scanner effects when using machine learning on multi-site neuroimaging data. We utilize structural T1-weighted brain MRI obtained from two different studies, Cam-CAN and UK Biobank. For the purpose of our investigation, we construct a dataset consisting of brain scans from 592 age- and sex-matched individuals, 296 subjects from each original study. Our results demonstrate that even after careful pre-processing with state-of-the-art neuroimaging pipelines a classifier can easily distinguish between the origin of the data with very high accuracy. Our analysis on the example application of sex classification suggests that current approaches to harmonize data are unable to remove scanner-specific bias leading to overly optimistic performance estimates and poor generalization. We conclude that multi-site data harmonization remains an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Medical Imaging Techniques and Applications · Digital Radiography and Breast Imaging
