Empirical investigation of multi-source cross-validation in clinical ECG classification
Tuija Leinonen, David Wong, Antti Vasankari, Ali Wahab, Ramesh, Nadarajah, Matti Kaisti, Antti Airola

TL;DR
This study empirically compares cross-validation methods in multi-source ECG classification, revealing that leave-source-out cross-validation offers more realistic performance estimates than standard K-fold methods, which tend to be overly optimistic.
Contribution
It systematically evaluates cross-validation strategies in multi-source medical data, demonstrating the advantages of leave-source-out validation for realistic performance assessment.
Findings
K-fold cross-validation overestimates accuracy for new sources.
Leave-source-out validation provides less biased estimates.
Multi-source data improves evaluation reliability.
Abstract
Traditionally, machine learning-based clinical prediction models have been trained and evaluated on patient data from a single source, such as a hospital. Cross-validation methods can be used to estimate the accuracy of such models on new patients originating from the same source, by repeated random splitting of the data. However, such estimates tend to be highly overoptimistic when compared to accuracy obtained from deploying models to sources not represented in the dataset, such as a new hospital. The increasing availability of multi-source medical datasets provides new opportunities for obtaining more comprehensive and realistic evaluations of expected accuracy through source-level cross-validation designs. In this study, we present a systematic empirical evaluation of standard K-fold cross-validation and leave-source-out cross-validation methods in a multi-source setting. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging
