CohortFinder: an open-source tool for data-driven partitioning of biomedical image cohorts to yield robust machine learning models
Fan Fan, Georgia Martinez, Thomas Desilvio, John Shin, Yijiang Chen,, Bangchen Wang, Takaya Ozeki, Maxime W. Lafarge, Viktor H. Koelzer, Laura, Barisoni, Anant Madabhushi, Satish E. Viswanath, Andrew Janowczyk

TL;DR
CohortFinder is an open-source tool designed to mitigate batch effects in biomedical imaging data by data-driven cohort partitioning, thereby improving machine learning model robustness and generalizability.
Contribution
The paper introduces CohortFinder, a novel open-source tool that effectively reduces batch effects in biomedical image datasets to enhance ML model performance.
Findings
CohortFinder improves ML model accuracy on medical imaging tasks.
The tool effectively mitigates batch effects in biomedical datasets.
Open-source availability facilitates widespread adoption.
Abstract
Batch effects (BEs) refer to systematic technical differences in data collection unrelated to biological variations whose noise is shown to negatively impact machine learning (ML) model generalizability. Here we release CohortFinder, an open-source tool aimed at mitigating BEs via data-driven cohort partitioning. We demonstrate CohortFinder improves ML model performance in downstream medical image processing tasks. CohortFinder is freely available for download at cohortfinder.com.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI) · Radiomics and Machine Learning in Medical Imaging
