How You Split Matters: Data Leakage and Subject Characteristics Studies in Longitudinal Brain MRI Analysis
Dewinda Julianensi Rumala

TL;DR
This paper examines how improper data splitting strategies can cause data leakage in longitudinal brain MRI analysis using 3D CNNs, emphasizing the importance of subject-wise splits for reliable model evaluation.
Contribution
It highlights the impact of data splitting choices on model performance and demonstrates the necessity of subject-wise splitting to prevent data leakage in longitudinal MRI studies.
Findings
Subject-wise splitting reduces data leakage.
Improper splits can lead to overly optimistic performance.
GradCAM reveals identity confounding in CNN models.
Abstract
Deep learning models have revolutionized the field of medical image analysis, offering significant promise for improved diagnostics and patient care. However, their performance can be misleadingly optimistic due to a hidden pitfall called 'data leakage'. In this study, we investigate data leakage in 3D medical imaging, specifically using 3D Convolutional Neural Networks (CNNs) for brain MRI analysis. While 3D CNNs appear less prone to leakage than 2D counterparts, improper data splitting during cross-validation (CV) can still pose issues, especially with longitudinal imaging data containing repeated scans from the same subject. We explore the impact of different data splitting strategies on model performance for longitudinal brain MRI analysis and identify potential data leakage concerns. GradCAM visualization helps reveal shortcuts in CNN models caused by identity confounding, where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Medical Imaging Techniques and Applications
