Lost in Distortion: Uncovering the Domain Gap Between Computer Vision and Brain Imaging -- A Study on Pretraining for Age Prediction
Yanteng Zhang, Songheng Li, Zeyu Shen, Qizhen Lan, Lipei Zhang, Yang Liu, Vince Calhoun

TL;DR
This paper investigates how data quality in neuroimaging affects pretraining for brain age prediction, highlighting the importance of domain-aware data curation to improve model generalization and trustworthiness.
Contribution
It systematically analyzes the impact of neuroimaging data quality on pretraining effectiveness and emphasizes the need for domain-specific data curation practices.
Findings
High-quality scans lead to better pretraining performance.
Low-quality or distorted scans can hinder model learning.
Domain-aware curation improves model generalization.
Abstract
Large-scale brain imaging datasets provide unprecedented opportunities for developing domain foundation models through pretraining. However, unlike natural image datasets in computer vision, these neuroimaging data often exhibit high heterogeneity in quality, ranging from well-structured scans to severely distorted or incomplete brain volumes. This raises a fundamental question: can noise or low-quality scans contribute meaningfully to pretraining, or do they instead hinder model learning? In this study, we systematically explore the role of data quality level in pretraining and its impact on downstream tasks. Specifically, we perform pretraining on datasets with different quality levels and perform fine-tuning for brain age prediction on external cohorts. Our results show significant performance differences across quality levels, revealing both opportunities and limitations. We further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Domain Adaptation and Few-Shot Learning · Functional Brain Connectivity Studies
